class DeltaTable extends DeltaTableOperations with Serializable
Main class for programmatically interacting with Delta tables. You can create DeltaTable instances using the static methods.
DeltaTable.forPath(sparkSession, pathToTheDeltaTable)
- Since
0.3.0
- Alphabetic
- By Inheritance
- DeltaTable
- Serializable
- Serializable
- DeltaTableOperations
- AnalysisHelper
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
addFeatureSupport(featureName: String): Unit
Modify the protocol to add a supported feature, and if the table does not support table features, upgrade the protocol automatically.
Modify the protocol to add a supported feature, and if the table does not support table features, upgrade the protocol automatically. In such a case when the provided feature is writer-only, the table's writer version will be upgraded to
7
, and when the provided feature is reader-writer, both reader and writer versions will be upgraded, to(3, 7)
.See online documentation and Delta's protocol specification at PROTOCOL.md for more details.
- Since
2.3.0
-
def
alias(alias: String): DeltaTable
Apply an alias to the DeltaTable.
Apply an alias to the DeltaTable. This is similar to
Dataset.as(alias)
or SQLtableName AS alias
.- Since
0.3.0
-
def
as(alias: String): DeltaTable
Apply an alias to the DeltaTable.
Apply an alias to the DeltaTable. This is similar to
Dataset.as(alias)
or SQLtableName AS alias
.- Since
0.3.0
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
delete(): Unit
Delete data from the table.
Delete data from the table.
- Since
0.3.0
-
def
delete(condition: Column): Unit
Delete data from the table that match the given
condition
.Delete data from the table that match the given
condition
.- condition
Boolean SQL expression
- Since
0.3.0
-
def
delete(condition: String): Unit
Delete data from the table that match the given
condition
.Delete data from the table that match the given
condition
.- condition
Boolean SQL expression
- Since
0.3.0
-
def
deltaLog: DeltaLog
- Attributes
- protected
-
def
detail(): DataFrame
:: Evolving ::
:: Evolving ::
Get the details of a Delta table such as the format, name, and size.
- Annotations
- @Evolving()
- Since
2.1.0
-
def
df: Dataset[Row]
- Attributes
- protected
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
executeDelete(condition: Option[Expression]): Unit
- Attributes
- protected
- Definition Classes
- DeltaTableOperations
-
def
executeDetails(path: String, tableIdentifier: Option[TableIdentifier]): DataFrame
- Attributes
- protected
- Definition Classes
- DeltaTableOperations
-
def
executeGenerate(tblIdentifier: String, mode: String): Unit
- Attributes
- protected
- Definition Classes
- DeltaTableOperations
-
def
executeHistory(deltaLog: DeltaLog, limit: Option[Int], tableId: Option[TableIdentifier]): DataFrame
- Attributes
- protected
- Definition Classes
- DeltaTableOperations
-
def
executeRestore(table: DeltaTableV2, versionAsOf: Option[Long], timestampAsOf: Option[String]): DataFrame
- Attributes
- protected
- Definition Classes
- DeltaTableOperations
-
def
executeUpdate(set: Map[String, Column], condition: Option[Column]): Unit
- Attributes
- protected
- Definition Classes
- DeltaTableOperations
-
def
executeVacuum(deltaLog: DeltaLog, retentionHours: Option[Double], tableId: Option[TableIdentifier]): DataFrame
- Attributes
- protected
- Definition Classes
- DeltaTableOperations
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
generate(mode: String): Unit
Generate a manifest for the given Delta Table
Generate a manifest for the given Delta Table
- mode
Specifies the mode for the generation of the manifest. The valid modes are as follows (not case sensitive):
- "symlink_format_manifest" : This will generate manifests in symlink format for Presto and Athena read support. See the online documentation for more information.
- Since
0.5.0
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
history(): DataFrame
Get the information available commits on this table as a Spark DataFrame.
Get the information available commits on this table as a Spark DataFrame. The information is in reverse chronological order.
- Since
0.3.0
-
def
history(limit: Int): DataFrame
Get the information of the latest
limit
commits on this table as a Spark DataFrame.Get the information of the latest
limit
commits on this table as a Spark DataFrame. The information is in reverse chronological order.- limit
The number of previous commands to get history for
- Since
0.3.0
-
def
improveUnsupportedOpError(f: ⇒ Unit): Unit
- Attributes
- protected
- Definition Classes
- AnalysisHelper
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
merge(source: DataFrame, condition: Column): DeltaMergeBuilder
Merge data from the
source
DataFrame based on the given mergecondition
.Merge data from the
source
DataFrame based on the given mergecondition
. This returns a DeltaMergeBuilder object that can be used to specify the update, delete, or insert actions to be performed on rows based on whether the rows matched the condition or not.See the DeltaMergeBuilder for a full description of this operation and what combinations of update, delete and insert operations are allowed.
Scala example to update a key-value Delta table with new key-values from a source DataFrame:
deltaTable .as("target") .merge( source.as("source"), "target.key = source.key") .whenMatched .updateExpr(Map( "value" -> "source.value")) .whenNotMatched .insertExpr(Map( "key" -> "source.key", "value" -> "source.value")) .execute()
Java example to update a key-value Delta table with new key-values from a source DataFrame:
deltaTable .as("target") .merge( source.as("source"), "target.key = source.key") .whenMatched .updateExpr( new HashMap<String, String>() {{ put("value" -> "source.value") }}) .whenNotMatched .insertExpr( new HashMap<String, String>() {{ put("key", "source.key"); put("value", "source.value"); }}) .execute()
- source
source Dataframe to be merged.
- condition
boolean expression as a Column object
- Since
0.3.0
-
def
merge(source: DataFrame, condition: String): DeltaMergeBuilder
Merge data from the
source
DataFrame based on the given mergecondition
.Merge data from the
source
DataFrame based on the given mergecondition
. This returns a DeltaMergeBuilder object that can be used to specify the update, delete, or insert actions to be performed on rows based on whether the rows matched the condition or not.See the DeltaMergeBuilder for a full description of this operation and what combinations of update, delete and insert operations are allowed.
Scala example to update a key-value Delta table with new key-values from a source DataFrame:
deltaTable .as("target") .merge( source.as("source"), "target.key = source.key") .whenMatched .updateExpr(Map( "value" -> "source.value")) .whenNotMatched .insertExpr(Map( "key" -> "source.key", "value" -> "source.value")) .execute()
Java example to update a key-value Delta table with new key-values from a source DataFrame:
deltaTable .as("target") .merge( source.as("source"), "target.key = source.key") .whenMatched .updateExpr( new HashMap<String, String>() {{ put("value" -> "source.value"); }}) .whenNotMatched .insertExpr( new HashMap<String, String>() {{ put("key", "source.key"); put("value", "source.value"); }}) .execute();
- source
source Dataframe to be merged.
- condition
boolean expression as SQL formatted string
- Since
0.3.0
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
optimize(): DeltaOptimizeBuilder
Optimize the data layout of the table.
Optimize the data layout of the table. This returns a DeltaOptimizeBuilder object that can be used to specify the partition filter to limit the scope of optimize and also execute different optimization techniques such as file compaction or order data using Z-Order curves.
See the DeltaOptimizeBuilder for a full description of this operation.
Scala example to run file compaction on a subset of partitions in the table:
deltaTable .optimize() .where("date='2021-11-18'") .executeCompaction();
- Since
2.0.0
-
def
resolveReferencesForExpressions(sparkSession: SparkSession, exprs: Seq[Expression], planProvidingAttrs: LogicalPlan): Seq[Expression]
- Attributes
- protected
- Definition Classes
- AnalysisHelper
-
def
restoreToTimestamp(timestamp: String): DataFrame
Restore the DeltaTable to an older version of the table specified by a timestamp.
Restore the DeltaTable to an older version of the table specified by a timestamp.
Timestamp can be of the format yyyy-MM-dd or yyyy-MM-dd HH:mm:ss
An example would be
io.delta.tables.DeltaTable.restoreToTimestamp("2019-01-01")
- Since
1.2.0
-
def
restoreToVersion(version: Long): DataFrame
Restore the DeltaTable to an older version of the table specified by version number.
Restore the DeltaTable to an older version of the table specified by version number.
An example would be
io.delta.tables.DeltaTable.restoreToVersion(7)
- Since
1.2.0
-
def
sparkSession: SparkSession
- Attributes
- protected
- Definition Classes
- DeltaTableOperations
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toDF: Dataset[Row]
Get a DataFrame (that is, Dataset[Row]) representation of this Delta table.
Get a DataFrame (that is, Dataset[Row]) representation of this Delta table.
- Since
0.3.0
-
def
toDataset(sparkSession: SparkSession, logicalPlan: LogicalPlan): Dataset[Row]
- Attributes
- protected
- Definition Classes
- AnalysisHelper
-
def
toStrColumnMap(map: Map[String, String]): Map[String, Column]
- Attributes
- protected
- Definition Classes
- DeltaTableOperations
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
tryResolveReferences(sparkSession: SparkSession)(expr: Expression, planContainingExpr: LogicalPlan): Expression
- Attributes
- protected
- Definition Classes
- AnalysisHelper
-
def
tryResolveReferencesForExpressions(sparkSession: SparkSession)(exprs: Seq[Expression], plansProvidingAttrs: Seq[LogicalPlan]): Seq[Expression]
- Attributes
- protected
- Definition Classes
- AnalysisHelper
-
def
tryResolveReferencesForExpressions(sparkSession: SparkSession, exprs: Seq[Expression], planContainingExpr: LogicalPlan): Seq[Expression]
- Attributes
- protected
- Definition Classes
- AnalysisHelper
-
def
update(condition: Column, set: Map[String, Column]): Unit
Update data from the table on the rows that match the given
condition
based on the rules defined byset
.Update data from the table on the rows that match the given
condition
based on the rules defined byset
.Java example to increment the column
data
.import org.apache.spark.sql.Column; import org.apache.spark.sql.functions; deltaTable.update( functions.col("date").gt("2018-01-01"), new HashMap<String, Column>() {{ put("data", functions.col("data").plus(1)); }} );
- condition
boolean expression as Column object specifying which rows to update.
- set
rules to update a row as a Java map between target column names and corresponding update expressions as Column objects.
- Since
0.3.0
-
def
update(condition: Column, set: Map[String, Column]): Unit
Update data from the table on the rows that match the given
condition
based on the rules defined byset
.Update data from the table on the rows that match the given
condition
based on the rules defined byset
.Scala example to increment the column
data
.import org.apache.spark.sql.functions._ deltaTable.update( col("date") > "2018-01-01", Map("data" -> col("data") + 1))
- condition
boolean expression as Column object specifying which rows to update.
- set
rules to update a row as a Scala map between target column names and corresponding update expressions as Column objects.
- Since
0.3.0
-
def
update(set: Map[String, Column]): Unit
Update rows in the table based on the rules defined by
set
.Update rows in the table based on the rules defined by
set
.Java example to increment the column
data
.import org.apache.spark.sql.Column; import org.apache.spark.sql.functions; deltaTable.update( new HashMap<String, Column>() {{ put("data", functions.col("data").plus(1)); }} );
- set
rules to update a row as a Java map between target column names and corresponding update expressions as Column objects.
- Since
0.3.0
-
def
update(set: Map[String, Column]): Unit
Update rows in the table based on the rules defined by
set
.Update rows in the table based on the rules defined by
set
.Scala example to increment the column
data
.import org.apache.spark.sql.functions._ deltaTable.update(Map("data" -> col("data") + 1))
- set
rules to update a row as a Scala map between target column names and corresponding update expressions as Column objects.
- Since
0.3.0
-
def
updateExpr(condition: String, set: Map[String, String]): Unit
Update data from the table on the rows that match the given
condition
, which performs the rules defined byset
.Update data from the table on the rows that match the given
condition
, which performs the rules defined byset
.Java example to increment the column
data
.deltaTable.update( "date > '2018-01-01'", new HashMap<String, String>() {{ put("data", "data + 1"); }} );
- condition
boolean expression as SQL formatted string object specifying which rows to update.
- set
rules to update a row as a Java map between target column names and corresponding update expressions as SQL formatted strings.
- Since
0.3.0
-
def
updateExpr(condition: String, set: Map[String, String]): Unit
Update data from the table on the rows that match the given
condition
, which performs the rules defined byset
.Update data from the table on the rows that match the given
condition
, which performs the rules defined byset
.Scala example to increment the column
data
.deltaTable.update( "date > '2018-01-01'", Map("data" -> "data + 1"))
- condition
boolean expression as SQL formatted string object specifying which rows to update.
- set
rules to update a row as a Scala map between target column names and corresponding update expressions as SQL formatted strings.
- Since
0.3.0
-
def
updateExpr(set: Map[String, String]): Unit
Update rows in the table based on the rules defined by
set
.Update rows in the table based on the rules defined by
set
.Java example to increment the column
data
.deltaTable.updateExpr( new HashMap<String, String>() {{ put("data", "data + 1"); }} );
- set
rules to update a row as a Java map between target column names and corresponding update expressions as SQL formatted strings.
- Since
0.3.0
-
def
updateExpr(set: Map[String, String]): Unit
Update rows in the table based on the rules defined by
set
.Update rows in the table based on the rules defined by
set
.Scala example to increment the column
data
.deltaTable.updateExpr(Map("data" -> "data + 1")))
- set
rules to update a row as a Scala map between target column names and corresponding update expressions as SQL formatted strings.
- Since
0.3.0
-
def
upgradeTableProtocol(readerVersion: Int, writerVersion: Int): Unit
Updates the protocol version of the table to leverage new features.
Updates the protocol version of the table to leverage new features. Upgrading the reader version will prevent all clients that have an older version of Delta Lake from accessing this table. Upgrading the writer version will prevent older versions of Delta Lake to write to this table. The reader or writer version cannot be downgraded.
See online documentation and Delta's protocol specification at PROTOCOL.md for more details.
- Since
0.8.0
-
def
vacuum(): DataFrame
Recursively delete files and directories in the table that are not needed by the table for maintaining older versions up to the given retention threshold.
Recursively delete files and directories in the table that are not needed by the table for maintaining older versions up to the given retention threshold. This method will return an empty DataFrame on successful completion.
note: This will use the default retention period of 7 days.
- Since
0.3.0
-
def
vacuum(retentionHours: Double): DataFrame
Recursively delete files and directories in the table that are not needed by the table for maintaining older versions up to the given retention threshold.
Recursively delete files and directories in the table that are not needed by the table for maintaining older versions up to the given retention threshold. This method will return an empty DataFrame on successful completion.
- retentionHours
The retention threshold in hours. Files required by the table for reading versions earlier than this will be preserved and the rest of them will be deleted.
- Since
0.3.0
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()