Class DeltaMergeBuilder
- All Implemented Interfaces:
org.apache.spark.internal.Logging,org.apache.spark.sql.delta.util.AnalysisHelper
whenMatched and whenNotMatched clauses.
Here are the constraints on these clauses.
- whenMatched clauses:
- The condition in a whenMatched clause is optional. However, if there are multiple
whenMatched clauses, then only the last one may omit the condition.
- When there are more than one whenMatched clauses and there are conditions (or the lack
of) such that a row satisfies multiple clauses, then the action for the first clause
satisfied is executed. In other words, the order of the whenMatched clauses matters.
- If none of the whenMatched clauses match a source-target row pair that satisfy
the merge condition, then the target rows will not be updated or deleted.
- If you want to update all the columns of the target Delta table with the
corresponding column of the source DataFrame, then you can use the
whenMatched(...).updateAll(). This is equivalent to
whenMatched(...).updateExpr(Map(
("col1", "source.col1"),
("col2", "source.col2"),
...))
- whenNotMatched clauses:
- The condition in a whenNotMatched clause is optional. However, if there are
multiple whenNotMatched clauses, then only the last one may omit the condition.
- When there are more than one whenNotMatched clauses and there are conditions (or the
lack of) such that a row satisfies multiple clauses, then the action for the first clause
satisfied is executed. In other words, the order of the whenNotMatched clauses matters.
- If no whenNotMatched clause is present or if it is present but the non-matching source
row does not satisfy the condition, then the source row is not inserted.
- If you want to insert all the columns of the target Delta table with the
corresponding column of the source DataFrame, then you can use
whenNotMatched(...).insertAll(). This is equivalent to
whenNotMatched(...).insertExpr(Map(
("col1", "source.col1"),
("col2", "source.col2"),
...))
- whenNotMatchedBySource clauses:
- The condition in a whenNotMatchedBySource clause is optional. However, if there are
multiple whenNotMatchedBySource clauses, then only the last one may omit the condition.
- When there are more than one whenNotMatchedBySource clauses and there are conditions (or
the lack of) such that a row satisfies multiple clauses, then the action for the first
clause satisfied is executed. In other words, the order of the whenNotMatchedBySource
clauses matters.
- If no whenNotMatchedBySource clause is present or if it is present but the
non-matching target row does not satisfy any of the whenNotMatchedBySource clause
condition, then the target row will not be updated or deleted.
Scala example to update a key-value Delta table with new key-values from a source DataFrame:
deltaTable
.as("target")
.merge(
source.as("source"),
"target.key = source.key")
.withSchemaEvolution()
.whenMatched()
.updateExpr(Map(
"value" -> "source.value"))
.whenNotMatched()
.insertExpr(Map(
"key" -> "source.key",
"value" -> "source.value"))
.whenNotMatchedBySource()
.updateExpr(Map(
"value" -> "target.value + 1"))
.execute()
Java example to update a key-value Delta table with new key-values from a source DataFrame:
deltaTable
.as("target")
.merge(
source.as("source"),
"target.key = source.key")
.withSchemaEvolution()
.whenMatched()
.updateExpr(
new HashMap<String, String>() {{
put("value", "source.value");
}})
.whenNotMatched()
.insertExpr(
new HashMap<String, String>() {{
put("key", "source.key");
put("value", "source.value");
}})
.whenNotMatchedBySource()
.updateExpr(
new HashMap<String, String>() {{
put("value", "target.value + 1");
}})
.execute();
- Since:
- 0.3.0
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.sql.delta.util.AnalysisHelper
org.apache.spark.sql.delta.util.AnalysisHelper.FakeLogicalPlan, org.apache.spark.sql.delta.util.AnalysisHelper.FakeLogicalPlan$Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter -
Constructor Summary
ConstructorsConstructorDescriptionDeltaMergeBuilder(DeltaTable targetTable, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> source, org.apache.spark.sql.Column onCondition, scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.plans.logical.DeltaMergeIntoClause> whenClauses) -
Method Summary
Modifier and TypeMethodDescriptionorg.apache.spark.sql.Dataset<org.apache.spark.sql.Row>execute()Execute the merge operation based on the built matched and not matched actions.Build the actions to perform when the merge condition was matched.whenMatched(String condition) Build the actions to perform when the merge condition was matched and the givenconditionis true.whenMatched(org.apache.spark.sql.Column condition) Build the actions to perform when the merge condition was matched and the givenconditionis true.Build the action to perform when the merge condition was not matched.whenNotMatched(String condition) Build the actions to perform when the merge condition was not matched and the givenconditionis true.whenNotMatched(org.apache.spark.sql.Column condition) Build the actions to perform when the merge condition was not matched and the givenconditionis true.Build the actions to perform when the merge condition was not matched by the source.whenNotMatchedBySource(String condition) Build the actions to perform when the merge condition was not matched by the source and the givenconditionis true.whenNotMatchedBySource(org.apache.spark.sql.Column condition) Build the actions to perform when the merge condition was not matched by the source and the givenconditionis true.Enable schema evolution for the merge operation.Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.sql.delta.util.AnalysisHelper
improveUnsupportedOpError, resolveReferencesForExpressions, toDataset, tryResolveReferences, tryResolveReferencesForExpressions, tryResolveReferencesForExpressionsMethods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
-
Constructor Details
-
DeltaMergeBuilder
public DeltaMergeBuilder(DeltaTable targetTable, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> source, org.apache.spark.sql.Column onCondition, scala.collection.immutable.Seq<org.apache.spark.sql.catalyst.plans.logical.DeltaMergeIntoClause> whenClauses)
-
-
Method Details
-
whenMatched
Build the actions to perform when the merge condition was matched. This returnsDeltaMergeMatchedActionBuilderobject which can be used to specify how to update or delete the matched target table row with the source row.- Returns:
- (undocumented)
- Since:
- 0.3.0
-
whenMatched
Build the actions to perform when the merge condition was matched and the givenconditionis true. This returnsDeltaMergeMatchedActionBuilderobject which can be used to specify how to update or delete the matched target table row with the source row.- Parameters:
condition- boolean expression as a SQL formatted string- Returns:
- (undocumented)
- Since:
- 0.3.0
-
whenMatched
Build the actions to perform when the merge condition was matched and the givenconditionis true. This returns aDeltaMergeMatchedActionBuilderobject which can be used to specify how to update or delete the matched target table row with the source row.- Parameters:
condition- boolean expression as a Column object- Returns:
- (undocumented)
- Since:
- 0.3.0
-
whenNotMatched
Build the action to perform when the merge condition was not matched. This returnsDeltaMergeNotMatchedActionBuilderobject which can be used to specify how to insert the new sourced row into the target table.- Returns:
- (undocumented)
- Since:
- 0.3.0
-
whenNotMatched
Build the actions to perform when the merge condition was not matched and the givenconditionis true. This returnsDeltaMergeMatchedActionBuilderobject which can be used to specify how to insert the new sourced row into the target table.- Parameters:
condition- boolean expression as a SQL formatted string- Returns:
- (undocumented)
- Since:
- 0.3.0
-
whenNotMatched
Build the actions to perform when the merge condition was not matched and the givenconditionis true. This returnsDeltaMergeMatchedActionBuilderobject which can be used to specify how to insert the new sourced row into the target table.- Parameters:
condition- boolean expression as a Column object- Returns:
- (undocumented)
- Since:
- 0.3.0
-
whenNotMatchedBySource
Build the actions to perform when the merge condition was not matched by the source. This returnsDeltaMergeNotMatchedBySourceActionBuilderobject which can be used to specify how to update or delete the target table row.- Returns:
- (undocumented)
- Since:
- 2.3.0
-
whenNotMatchedBySource
Build the actions to perform when the merge condition was not matched by the source and the givenconditionis true. This returnsDeltaMergeNotMatchedBySourceActionBuilderobject which can be used to specify how to update or delete the target table row.- Parameters:
condition- boolean expression as a SQL formatted string- Returns:
- (undocumented)
- Since:
- 2.3.0
-
whenNotMatchedBySource
public DeltaMergeNotMatchedBySourceActionBuilder whenNotMatchedBySource(org.apache.spark.sql.Column condition) Build the actions to perform when the merge condition was not matched by the source and the givenconditionis true. This returnsDeltaMergeNotMatchedBySourceActionBuilderobject which can be used to specify how to update or delete the target table row .- Parameters:
condition- boolean expression as a Column object- Returns:
- (undocumented)
- Since:
- 2.3.0
-
withSchemaEvolution
Enable schema evolution for the merge operation. This allows the schema of the target table/columns to be automatically updated based on the schema of the source table/columns.- Returns:
- (undocumented)
- Since:
- 3.2.0
-
execute
public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> execute()Execute the merge operation based on the built matched and not matched actions.- Returns:
- (undocumented)
- Since:
- 0.3.0
-