Table protocol versioning
The transaction log for a Delta table contains protocol versioning information that supports Delta Lake evolution. Delta Lake tracks minimum reader and writer versions separately.
Delta Lake guarantees backward compatibility. A higher protocol version of the Delta Lake reader is always able to read data that was written by a lower protocol version.
Delta Lake will occasionally break forward compatibility. Lower protocol versions of Delta Lake may not be able to read and write data that was written by a higher protocol version of Delta Lake. If you try to read and write to a table with a protocol version of Delta Lake that is too low, you’ll get an error telling you that you need to upgrade.
When creating a table, Delta Lake chooses the minimum required protocol version based on table characteristics such as the schema or table properties. You can also set the default protocol versions by setting the SQL configurations:
spark.databricks.delta.properties.defaults.minWriterVersion = 2
(default)spark.databricks.delta.properties.defaults.minReaderVersion = 1
(default)
To upgrade a table to a newer protocol version, use the DeltaTable.upgradeTableProtocol
method:
Warning
Protocol version upgrades are irreversible, and upgrading the protocol version may break the existing Delta Lake table readers, writers, or both. Therefore, we recommend you upgrade specific tables only when needed, such as to opt-in to new features in Delta Lake. You should also check to make sure that all of your current and future production tools support Delta Lake tables with the new protocol version.
-- Upgrades the reader protocol version to 1 and the writer protocol version to 3.
ALTER TABLE <table_identifier> SET TBLPROPERTIES('delta.minReaderVersion' = '1', 'delta.minWriterVersion' = '3')
from delta.tables import DeltaTable
delta = DeltaTable.forPath(spark, "path_to_table") # or DeltaTable.forName
delta.upgradeTableProtocol(1, 3) # Upgrades to readerVersion=1, writerVersion=3.
import io.delta.tables.DeltaTable
val delta = DeltaTable.forPath(spark, "path_to_table") // or DeltaTable.forName
delta.upgradeTableProtocol(1, 3) // Upgrades to readerVersion=1, writerVersion=3.
Features by protocol version
Feature | minWriterVersion |
minReaderVersion |
Introduced in | Documentation |
---|---|---|---|---|
Basic functionality | 2 | 1 | – | Delta Lake guide |
CHECK constraints |
3 | 1 | Delta Lake 0.8.0 | CHECK constraint |
Generated columns | 4 | 1 | Delta Lake 1.0.0 | Use generated columns |
Column mapping | 5 | 2 | Delta Lake 1.2.0 | Column mapping |
Change data feed | 4 | 1 | Delta Lake 2.0.0 | Change data feed |
See Requirements for Readers and Writer Version Requirements in the delta-io/delta repo on the GitHub website.
Column mapping
Note
This feature is available in Delta Lake 1.2.0 and above. This feature is currently experimental with known limitations.
Column mapping feature allows Delta table columns and the underlying Parquet file columns to use different names. This enables Delta schema evolution operations such as RENAME COLUMN and DROP COLUMNS on a Delta table without the need to rewrite the underlying Parquet files. It also allows users to name Delta table columns by using characters that are not allowed by Parquet, such as spaces, so that users can directly ingest CSV or JSON data into Delta without the need to rename columns due to previous character constraints.
Column mapping requires upgrading the Delta Lake table protocol.
Warning
Protocol version upgrades are irreversible, and upgrading the protocol version may break the existing Delta Lake table readers, writers, or both. Therefore, we recommend you upgrade specific tables only when needed, such as to opt-in to new features in Delta Lake. You should also check to make sure that all of your current and future production tools support Delta Lake tables with the new protocol version.
ALTER TABLE <table_name> SET TBLPROPERTIES (
'delta.minReaderVersion' = '2',
'delta.minWriterVersion' = '5',
'delta.columnMapping.mode' = 'name'
)
Known limitations
- In Delta Lake 2.0.0, Spark Structured Streaming and Change data feed reads are explicitly blocked on a column mapping enabled table.
- The Delta table protocol specifies two modes of column mapping, by
name
and byid
. Currently in Delta Lake only thename
mode is supported.