Table protocol versioning

The transaction log for a Delta table contains protocol versioning information that supports Delta Lake evolution. Delta Lake tracks minimum reader and writer versions separately.

Delta Lake guarantees backward compatibility. A higher protocol version of the Delta Lake reader is always able to read data that was written by a lower protocol version.

Delta Lake will occasionally break forward compatibility. Lower protocol versions of Delta Lake may not be able to read and write data that was written by a higher protocol version of Delta Lake. If you try to read and write to a table with a protocol version of Delta Lake that is too low, you’ll get an error telling you that you need to upgrade.

When creating a table, Delta Lake chooses the minimum required protocol version based on table characteristics such as the schema or table properties. You can also set the default protocol versions by setting the SQL configurations:

  • spark.databricks.delta.properties.defaults.minWriterVersion = 2 (default)
  • spark.databricks.delta.properties.defaults.minReaderVersion = 1 (default)

To upgrade a table to a newer protocol version, use the DeltaTable.upgradeTableProtocol method:

Warning

Protocol version upgrades are irreversible, and upgrading the protocol version may break the existing Delta Lake table readers, writers, or both. Therefore, we recommend you upgrade specific tables only when needed, such as to opt-in to new features in Delta Lake. You should also check to make sure that all of your current and future production tools support Delta Lake tables with the new protocol version.

-- Upgrades the reader protocol version to 1 and the writer protocol version to 3.
ALTER TABLE <table_identifier> SET TBLPROPERTIES('delta.minReaderVersion' = '1', 'delta.minWriterVersion' = '3')
from delta.tables import DeltaTable
delta = DeltaTable.forPath(spark, "path_to_table") # or DeltaTable.forName
delta.upgradeTableProtocol(1, 3) # Upgrades to readerVersion=1, writerVersion=3.
import io.delta.tables.DeltaTable
val delta = DeltaTable.forPath(spark, "path_to_table") // or DeltaTable.forName
delta.upgradeTableProtocol(1, 3) // Upgrades to readerVersion=1, writerVersion=3.

Features by protocol version

Feature minWriterVersion minReaderVersion Introduced in Documentation
Basic functionality 2 1 Delta Lake guide
CHECK constraints 3 1 Delta Lake 0.8.0 CHECK constraint
Generated columns 4 1 Delta Lake 1.0.0 Use generated columns
Column mapping 5 2 Delta Lake 1.2.0 Column mapping
Change data feed 4 1 Delta Lake 2.0.0 Change data feed

See Requirements for Readers and Writer Version Requirements in the delta-io/delta repo on the GitHub website.

Column mapping

Note

This feature is available in Delta Lake 1.2.0 and above. This feature is currently experimental with known limitations.

Column mapping feature allows Delta table columns and the underlying Parquet file columns to use different names. This enables Delta schema evolution operations such as RENAME COLUMN and DROP COLUMNS on a Delta table without the need to rewrite the underlying Parquet files. It also allows users to name Delta table columns by using characters that are not allowed by Parquet, such as spaces, so that users can directly ingest CSV or JSON data into Delta without the need to rename columns due to previous character constraints.

Column mapping requires upgrading the Delta Lake table protocol.

Warning

Protocol version upgrades are irreversible, and upgrading the protocol version may break the existing Delta Lake table readers, writers, or both. Therefore, we recommend you upgrade specific tables only when needed, such as to opt-in to new features in Delta Lake. You should also check to make sure that all of your current and future production tools support Delta Lake tables with the new protocol version.

ALTER TABLE <table_name> SET TBLPROPERTIES (
   'delta.minReaderVersion' = '2',
   'delta.minWriterVersion' = '5',
   'delta.columnMapping.mode' = 'name'
)

Known limitations

  • In Delta Lake 2.0.0, Spark Structured Streaming and Change data feed reads are explicitly blocked on a column mapping enabled table.
  • The Delta table protocol specifies two modes of column mapping, by name and by id. Currently in Delta Lake only the name mode is supported.