Delta Kernel

The Delta Kernel project is a set of libraries (Java and Rust) for building Delta connectors that can read from and write into Delta tables without the need to understand the Delta protocol details.

You can use this library to do the following:

  • Read data from small Delta tables in a single thread in a single process.

  • Read data from large Delta tables using multiple threads in a single process.

  • Build a complex connector for a distributed processing engine and read very large Delta tables.

  • Insert data into a Delta table either from a single process or a complex distributed engine.

Here is an example of a simple table scan with a filter:

Engine myEngine = DefaultEngine.create() ;                  // define a engine (more details below)
Table myTable = Table.forPath("/delta/table/path");         // define what table to scan
Snapshot mySnapshot = myTable.getLatestSnapshot(myEngine);  // define which version of table to scan
Scan myScan = mySnapshot.getScanBuilder(myEngine)           // specify the scan details
  .withFilters(myEngine, scanFilter)
  .build();
CloseableIterator<ColumnarBatch> physicalData =             // read the Parquet data files
  .. read from Parquet data files ...
Scan.transformPhysicalData(...)                             // returns the table data

A complete version of the above example program and more examples of reading from and writing into a Delta table are available here.

Notice that there are two sets of public APIs to build connectors.

  • Table APIs - Interfaces like `Table` and `Snapshot` that allow you to read (and soon write to) Delta tables

  • Engine APIs - The `Engine` interface allows you to plug in connector-specific optimizations to compute-intensive components in the Kernel. For example, Delta Kernel provides a default Parquet file reader via the DefaultEngine, but you may choose to replace that default with a custom Engine implementation that has a faster Parquet reader for your connector/processing engine.

Kernel Java

The Java Kernel is a set of libraries for building Delta connectors in Java. See Kernel Java for more information.

Kernel Rust

The Rust Kernel is a set of libraries for building Delta connectors in native languages. See Kernel Rust for more information.

More Information

  • Talk explaining the rationale behind Kernel and the API design (slides are available here which are kept up-to-date with the changes).

  • User guide on the step-by-step process of using Kernel in a standalone Java program or in a distributed processing connector for reading and writing to Delta tables.

  • Example Java programs that illustrate how to read and write Delta tables using the Kernel APIs.

  • Table and default Engine API Java documentation

  • Migration guide