Frequently asked questions (FAQ)

In this article:

What is Delta Lake?
How is Delta Lake related to Apache Spark?
What format does Delta Lake use to store data?
How can I read and write data with Delta Lake?
Where does Delta Lake store the data?
Can I stream data directly into and from Delta tables?
Does Delta Lake support writes or reads using the Spark Streaming DStream API?
When I use Delta Lake, will I be able to port my code to other Spark platforms easily?
Does Delta Lake support multi-table transactions?
How can I change the type of a column?

What is Delta Lake?

Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs.

What format does Delta Lake use to store data?

Delta Lake uses versioned Parquet files to store your data in your cloud storage. Apart from the versions, Delta Lake also stores a transaction log to keep track of all the commits made to the table or blob store directory to provide ACID transactions.