Package io.delta.kernel
Interface Transaction
Represents a transaction to mutate a Delta table.
- Since:
- 3.2.0
-
Method Summary
Modifier and TypeMethodDescriptionvoid
addDomainMetadata
(String domain, String config) Commit the provided domain metadata as part of this transaction.commit
(Engine engine, CloseableIterable<Row> dataActions) Commit the transaction including the data action rows generated bygenerateAppendActions(io.delta.kernel.engine.Engine, io.delta.kernel.data.Row, io.delta.kernel.utils.CloseableIterator<io.delta.kernel.utils.DataFileStatus>, io.delta.kernel.DataWriteContext)
.static CloseableIterator<Row>
generateAppendActions
(Engine engine, Row transactionState, CloseableIterator<DataFileStatus> fileStatusIter, DataWriteContext dataWriteContext) For given data files, generate Delta actions that can be committed in a transaction.getPartitionColumns
(Engine engine) Get the list of logical names of the partition columns.long
Gets the latest version of the table used as the base of this transaction.Get the schema of the table.getTransactionState
(Engine engine) Get the state of the transaction.static DataWriteContext
Get the context for writing data into a table.void
removeDomainMetadata
(String domain) Mark the domain metadata with identifierdomain
as removed in this transaction.transformLogicalData
(Engine engine, Row transactionState, CloseableIterator<FilteredColumnarBatch> dataIter, Map<String, Literal> partitionValues) Given the logical data that needs to be written to the table, convert it into the required physical data depending upon the table Delta protocol and features enabled on the table.
-
Method Details
-
getSchema
Get the schema of the table. If the connector is adding any data to the table through this transaction, it should have the same schema as the table schema. -
getPartitionColumns
Get the list of logical names of the partition columns. This helps the connector to do physical partitioning of the data before asking the Kernel to stage the data per partition. -
getReadTableVersion
long getReadTableVersion()Gets the latest version of the table used as the base of this transaction. This returns -1 when the table is being created in this transaction.- Returns:
- The version of the table as of the beginning of this Transaction
-
getTransactionState
Get the state of the transaction. The state helps Kernel do the transformations to logical data according to the Delta protocol and table features enabled on the table. The engine should use this at the data writer task to transform the logical data that the engine wants to write to the table in to physical data that goes in data files usingtransformLogicalData(Engine, Row, CloseableIterator, Map)
-
commit
TransactionCommitResult commit(Engine engine, CloseableIterable<Row> dataActions) throws ConcurrentWriteException Commit the transaction including the data action rows generated bygenerateAppendActions(io.delta.kernel.engine.Engine, io.delta.kernel.data.Row, io.delta.kernel.utils.CloseableIterator<io.delta.kernel.utils.DataFileStatus>, io.delta.kernel.DataWriteContext)
.- Parameters:
engine
-Engine
instance.dataActions
- Iterable of data actions to commit. These data actions are generated by thegenerateAppendActions(Engine, Row, CloseableIterator, DataWriteContext)
. TheCloseableIterable
allows the Kernel to access the list of actions multiple times (in case of retries to resolve the conflicts due to other writers to the table). Kernel provides a in-memory based implementation ofCloseableIterable
with utility APICloseableIterable.inMemoryIterable(CloseableIterator)
- Returns:
TransactionCommitResult
status of the successful transaction.- Throws:
ConcurrentWriteException
- when the transaction has encountered a non-retryable conflicts or exceeded the maximum number of retries reached. The connector needs to rerun the query on top of the latest table state and retry the transaction.
-
addDomainMetadata
Commit the provided domain metadata as part of this transaction. If this is called more than once with the samedomain
the latest providedconfig
will be committed in the transaction. Only user-controlled domains are allowed (aka. domains with a `delta.` prefix are not allowed). Adding and removing a domain with the same identifier in the same txn is not allowed. Adding domain metadata to a table that does not support the table feature is not allowed. To enable the table feature, make sure to callTransactionBuilder.withDomainMetadataSupported()
- Parameters:
domain
- the domain identifierconfig
- configuration string for this domain
-
removeDomainMetadata
Mark the domain metadata with identifierdomain
as removed in this transaction. If this domain does not exist in the latest version of the table, callingcommit(Engine, CloseableIterable)
will throw aDomainDoesNotExistException
. Adding and removing a domain with the same identifier in one txn is not allowed.- Parameters:
domain
- the domain identifier for the domain to remove
-
transformLogicalData
static CloseableIterator<FilteredColumnarBatch> transformLogicalData(Engine engine, Row transactionState, CloseableIterator<FilteredColumnarBatch> dataIter, Map<String, Literal> partitionValues) Given the logical data that needs to be written to the table, convert it into the required physical data depending upon the table Delta protocol and features enabled on the table. Kernel takes care of adding any additional column or removing existing columns that doesn't need to be in physical data files. All these transformations are driven by the Delta protocol and table features enabled on the table.The given data should belong to exactly one partition. It is the job of the connector to do partitioning of the data before calling the API. Partition values are provided as map of column name to partition value (as
Literal
). If the table is an un-partitioned table, then map should be empty.- Parameters:
engine
-Engine
instance to use.transactionState
- The transaction statedataIter
- Iterator of logical data (with schema same as the table schema) to transform to physical data. All the data n this iterator should belong to one physical partition and it should also include the partition data.partitionValues
- The partition values for the data. If the table is un-partitioned, the map should be empty- Returns:
- Iterator of physical data to write to the data files.
-
getWriteContext
static DataWriteContext getWriteContext(Engine engine, Row transactionState, Map<String, Literal> partitionValues) Get the context for writing data into a table. The context tells the connector where the data should be written. For partitioned table context is generated per partition. So, the connector should call this API for each partition. For un-partitioned table, the context is same for all the data.- Parameters:
engine
-Engine
instance to use.transactionState
- The transaction statepartitionValues
- The partition values for the data. If the table is un-partitioned, the map should be empty- Returns:
DataWriteContext
containing metadata about where and how the data for partition should be written.
-
generateAppendActions
static CloseableIterator<Row> generateAppendActions(Engine engine, Row transactionState, CloseableIterator<DataFileStatus> fileStatusIter, DataWriteContext dataWriteContext) For given data files, generate Delta actions that can be committed in a transaction. These data files are the result of writing the data returned bytransformLogicalData(io.delta.kernel.engine.Engine, io.delta.kernel.data.Row, io.delta.kernel.utils.CloseableIterator<io.delta.kernel.data.FilteredColumnarBatch>, java.util.Map<java.lang.String, io.delta.kernel.expressions.Literal>)
with the context returned bygetWriteContext(io.delta.kernel.engine.Engine, io.delta.kernel.data.Row, java.util.Map<java.lang.String, io.delta.kernel.expressions.Literal>)
.- Parameters:
engine
-Engine
instance.transactionState
- State of the transaction.fileStatusIter
- Iterator of row objects representing each data file written. Whendelta.icebergCompatV2
is enabled, each data file status should containDataFileStatistics
with at least theDataFileStatistics.getNumRecords()
field set.dataWriteContext
- The context used when writing the data files given infileStatusIter
- Returns:
CloseableIterator
ofRow
representing the actions to commit usingcommit(io.delta.kernel.engine.Engine, io.delta.kernel.utils.CloseableIterable<io.delta.kernel.data.Row>)
.
-