Presto and Athena to Delta Lake Integration

Presto and Athena support reading from external tables using a manifest file, which is a text file containing the list of data files to read for querying a table. When an external table is defined in the Hive metastore using manifest files, Presto and Athena can use the list of files in the manifest rather than finding the files by directory listing. This article describes how to set up a Presto and Athena to Delta Lake integration using manifest files and query Delta tables.

Set up the Presto or Athena to Delta Lake integration and query Delta tables

You set up a Presto or Athena to Delta Lake integration using the following steps.

Step 1: Generate manifests of a Delta table using Apache Spark

Using Spark configured with Delta Lake, run any of the following commands on a Delta table at location pathToDeltaTable:

SQL

GENERATE symlink_format_manifest FOR TABLE delta.`pathToDeltaTable`

See Enable SQL commands within Apache Spark for the steps to enable support for SQL commands in Apache Spark.

Scala

val deltaTable = DeltaTable.forPath(pathToDeltaTable)
deltaTable.generate("symlink_format_manifest")

Java

DeltaTable deltaTable = DeltaTable.forPath(pathToDeltaTable);
deltaTable.generate("symlink_format_manifest");

Python

deltaTable = DeltaTable.forPath(pathToDeltaTable)
deltaTable.generate("symlink_format_manifest")

These commands generate manifest files at pathToDeltaTable/_symlink_format_manifest/. In other words, the files in this directory will contain the names of the data files (that is, Parquet files) that should be read for reading a snapshot of the Delta table.

Note

We recommend that you define the Delta table in a location that Presto or Athena read directly.

Step 2: Configure Presto or Athena to read the generated manifests

  1. Define a new table in the Hive metastore connected to Presto or Athena using the special format SymlinkTextInputFormat and the manifest location pathToDeltaTable/_symlink_format_manifest/:

    CREATE EXTERNAL TABLE mytable ( ... )   -- same schema as the Delta table
    PARTITIONED BY ( ... )  -- optional, must be the same as the Delta table
    ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
    STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION '<pathToDeltaTable>/_symlink_format_manifest/'  -- location of the generated manifest
    

    Replace <pathToDeltaTable> with the full path to the Delta table. The SymlinkTextInputFormat configures Presto or Athena to compute file splits for mytable by reading the manifest file instead of using a directory listing to find data files.

    Important

    • This table definition cannot be used in a query in Apache Spark. It can be used only by Presto and Athena. See later sections to find out how to define tables for Apache Spark and Presto or Athena to interoperate in an integrated environment.

    The tool you use to run the command depends on whether Apache Spark and Presto or Athena use the same Hive metastore.

    • Same metastore: If both Apache Spark and Presto or Athena use the same Hive metastore, you can define the table using Apache Spark.
    • Different metastores: If Apache Spark and Presto or Athena use different metastores, you must define the table using other tools.
      • Athena: You can define the external table in Athena.
      • Presto: Presto does not support the syntax CREATE EXTERNAL TABLE ... STORED AS ..., so you must use another tool (for example, Spark or Hive) connected to the same metastore as Presto to create the table.
  2. If the Delta table is partitioned, run MSCK REPAIR TABLE mytable after generating the manifests to force the metastore (connected to Presto or Athena) to discover the partitions. This is needed because the manifest of a partitioned table is itself partitioned in the same directory structure as the table. Run this command using the same tool used to create the table. Furthermore, you should run this command:

    • After every manifest generation: New partitions are likely to be visible immediately after the manifest files have been updated. However, doing this too frequently can cause high load for the Hive metastore.
    • As frequently as new partitions are expected: For example, if a table is partitioned by date, then you can run repair once after every midnight, after the new partition has been created in the table and its corresponding manifest files have been generated.

Step 3: Update manifests

When the data in a Delta table is updated, you must regenerate the manifests.

Limitations

The Presto and Athena integration has known limitations in its behavior.

Data consistency

Whenever Delta Lake generates updated manifests, it atomically overwrites existing manifest files. Therefore, Presto and Athena will always see a consistent view of the data files; it will see all of the old version files or all of the new version files. However, the granularity of the consistency guarantees depends on whether the table is partitioned or not.

  • Unpartitioned tables: All the files names are written in one manifest file which is updated atomically. In this case Presto and Athena will see full table snapshot consistency.
  • Partitioned tables: A manifest file is partitioned in the same Hive-partitioning-style directory structure as the original Delta table. This means that each partition is updated atomically, and Presto or Athena will see a consistent view of each partition but not a consistent view across partitions. Furthermore, since all manifests of all partitions cannot be updated together, concurrent attempts to generate manifests can lead to different partitions having manifests of different versions.

Depending on what storage system you are using for Delta tables, it is possible to get incorrect results when Presto or Athena concurrently queries the manifest while the manifest files are being rewritten. In file system implementations that lack atomic file overwrites, a manifest file may be momentarily unavailable. Hence, use manifests with caution if their updates are likely to coincide with queries from Presto or Athena.

Performance

Very large numbers of files can hurt the performance of Presto and Athena. Hence we recommend that you compact the files of the table before generating the manifests. We suggest that the number of files should not exceed 1000 (for the entire unpartitioned table or for each partition in a partitioned table).

Schema evolution

Delta Lake supports schema evolution and queries on a Delta table automatically use the latest schema regardless of the schema defined in the table in the Hive metastore. However, Presto or Athena uses the schema defined in the Hive metastore and will not query with the updated schema until the table used by Presto or Athena is redefined to have the updated schema.