Welcome to the Delta Lake documentation

Note

Delta Lake 4.0 Preview is released! See the 4.0 Preview documentation here.

This is the documentation site for Delta Lake.

Introduction
Quickstart
- Set up Apache Spark with Delta Lake
- Create a table
- Read data
- Update table data
- Read older versions of data using time travel
- Write a stream of data to a table
- Read a stream of changes from a table
Table batch reads and writes
- Create a table
- Read a table
- Query an older snapshot of a table (time travel)
- Write to a table
- Schema validation
- Update table schema
- Replace table schema
- Views on tables
- Table properties
- Syncing table schema and properties to the Hive metastore
- Table metadata
- Configure SparkSession
- Configure storage credentials
Table streaming reads and writes
- Delta table as a source
- Delta table as a sink
- Idempotent table writes in foreachBatch
Table deletes, updates, and merges
- Delete from a table
- Update a table
- Upsert into a table using merge
- Special considerations for schemas that contain arrays of structs
- Merge examples
Change data feed
- Use cases
- Enable change data feed
- Read changes in batch queries
- Read changes in streaming queries
- What is the schema for the change data feed?
- Change data feed limitations for tables with column mapping enabled
- Frequently asked questions (FAQ)
Table utility commands
- Remove files no longer referenced by a Delta table
- Retrieve Delta table history
- Retrieve Delta table details
- Generate a manifest file
- Convert a Parquet table to a Delta table
- Convert an Iceberg table to a Delta table
- Convert a Delta table to a Parquet table
- Restore a Delta table to an earlier state
- Shallow clone a Delta table
- Clone Parquet or Iceberg table to Delta
Constraints
- NOT NULL constraint
- CHECK constraint
How does Delta Lake manage feature compatibility?
- What Delta Lake features require client upgrades?
- What is a table protocol specification?
- Which protocols must be upgraded?
- What are table features?
- Do table features change how Delta Lake features are enabled?
- What is a protocol version?
- Features by protocol version
- Upgrading protocol versions
Delta default column values
- How to enable Delta Lake default column values
- How to use default columns in SQL commands
Delta column mapping
- How to enable Delta Lake column mapping
- Rename a column
- Drop columns
- Supported characters in column names
- Known limitations
Use liquid clustering for Delta tables
- What is liquid clustering used for?
- Enable liquid clustering
- Choose clustering columns
- Write data to a clustered table
- How to trigger clustering
- Read data from a clustered table
- Change clustering columns
- See how table is clustered
- Limitations
What are deletion vectors?
- Enable deletion vectors
- Apply changes to Parquet data files
Drop Delta table features
- How can I drop a Delta table feature?
- What Delta table features can be dropped?
- How are Delta table features dropped?
Delta Lake APIs
- Delta Spark
- Delta Standalone
- Delta Flink
- Delta Kernel
Delta Connect (aka Spark Connect Support in Delta)
- Motivation
- How to start the Spark Server with Delta
- How to use the Python Spark Connect Client with Delta
- How to use the Scala Spark Connect Client with Delta
- Preview Limitations
Storage configuration
- Amazon S3
- Microsoft Azure storage
- HDFS
- Google Cloud Storage
- Oracle Cloud Infrastructure
- IBM Cloud Object Storage
Delta Coordinated Commits
DynamoDB Commit Coordinator
- Quickstart Guide
- Removing the Coordinated Commits Feature
- Compatibility
- Dependencies
Delta type widening
- Supported type changes
- How to enable Delta Lake type widening
- Manually applying a type change
- Type changes with automatic schema evolution
- Removing the type widening table feature
Universal Format (UniForm)
- Requirements
- Enable Delta Lake UniForm
- When does UniForm generate metadata?
- Check Iceberg/Hudi metadata generation status
- Read UniForm tables as Iceberg tables in Apache Spark
- Read UniForm tables as Iceberg tables using a metadata JSON path
- Read UniForm tables as Hudi tables in Apache Spark
- Delta and Iceberg/Hudi table versions
- Limitations
Read Delta Sharing Tables
- Read a snapshot
- Query an older snapshot of a shared table (time travel)
- Read Table Changes (CDF)
- Streaming
- Read Advanced Delta Lake Features in Delta Sharing
Concurrency control
- Optimistic concurrency control
- Write conflicts
- Avoid conflicts using partitioning and disjoint command conditions
- Conflict exceptions
Access Delta tables from external data processing engines
- Presto to Delta Lake integration
- Trino to Delta Lake integration
- Athena to Delta Lake integration
- Other integrations
Migration guide
- Migrate workloads to Delta Lake
- Migrate Delta Lake workloads to newer versions
Best practices
- Choose the right partition column
- Compact files
- Replace the content or schema of a table
- Spark caching
Frequently asked questions (FAQ)
- What is Delta Lake?
- How is Delta Lake related to Apache Spark?
- What format does Delta Lake use to store data?
- How can I read and write data with Delta Lake?
- Where does Delta Lake store the data?
- Can I copy my Delta Lake table to another location?
- Can I stream data directly into and from Delta tables?
- Does Delta Lake support writes or reads using the Spark Streaming DStream API?
- When I use Delta Lake, will I be able to port my code to other Spark platforms easily?
- Does Delta Lake support multi-table transactions?
- How can I change the type of a column?
Releases
- Release notes
- Compatibility with Apache Spark
Delta Lake resources
- Blog posts and talks
- VLDB 2020 paper
- Examples
- Delta Lake transaction log specification
Optimizations
- Optimize performance with file management
- Auto compaction
- Data skipping
- Z-Ordering (multi-dimensional clustering)
- Multi-part checkpointing
- Log compactions
- Optimized Write
Delta table properties reference