IN
- Type of the elements in the input of the sink that are also the elements to be
written to its outputpublic class DeltaSink<IN>
extends <any>
DeltaLog
. This sink achieves exactly-once
semantics for both BATCH
and STREAMING
.
For most use cases users should use forRowData(org.apache.flink.core.fs.Path, org.apache.hadoop.conf.Configuration, org.apache.flink.table.types.logical.RowType)
utility method to instantiate
the sink which provides proper writer factory implementation for the stream of RowData
.
To create new instance of the sink to a non-partitioned Delta table for stream of
RowData
:
DataStream<RowData> stream = ...; RowType rowType = ...; ... // sets a sink to a non-partitioned Delta table DeltaSink<RowData> deltaSink = DeltaSink.forRowData( new Path(deltaTablePath), new Configuration(), rowType).build(); stream.sinkTo(deltaSink);To create new instance of the sink to a partitioned Delta table for stream of
RowData
:
String[] partitionCols = ...; // array of partition columns' names DeltaSink<RowData> deltaSink = DeltaSink.forRowData( new Path(deltaTablePath), new Configuration(), rowType) .withPartitionColumns(partitionCols) .build(); stream.sinkTo(deltaSink);
Behaviour of this sink splits down upon two phases. The first phase takes place between
application's checkpoints when records are being flushed to files (or appended to writers'
buffers) where the behaviour is almost identical as in case of
org.apache.flink.connector.file.sink.FileSink
.
Next during the checkpoint phase files are "closed" (renamed) by the independent instances of
io.delta.flink.sink.internal.committer.DeltaCommitter
that behave very similar
to org.apache.flink.connector.file.sink.committer.FileCommitter
.
When all the parallel committers are done, then all the files are committed at once by
single-parallelism io.delta.flink.sink.internal.committer.DeltaGlobalCommitter
.
Modifier and Type | Method and Description |
---|---|
static RowDataDeltaSinkBuilder |
forRowData(org.apache.flink.core.fs.Path basePath,
org.apache.hadoop.conf.Configuration conf,
org.apache.flink.table.types.logical.RowType rowType)
Convenience method for creating a
RowDataDeltaSinkBuilder for DeltaSink to a
Delta table. |
public static RowDataDeltaSinkBuilder forRowData(org.apache.flink.core.fs.Path basePath, org.apache.hadoop.conf.Configuration conf, org.apache.flink.table.types.logical.RowType rowType)
RowDataDeltaSinkBuilder
for DeltaSink
to a
Delta table.basePath
- root path of the Delta tableconf
- Hadoop's conf object that will be used for creating instances of
DeltaLog
and will be also passed to the
ParquetRowDataBuilder
to create ParquetWriterFactory
rowType
- Flink's logical type to indicate the structure of the events in the stream