DeltaTable (Delta Lake 0.8.1 JavaDoc)

Object
- io.delta.tables.DeltaTable

All Implemented Interfaces:

java.io.Serializable
```
public class DeltaTable
extends Object
implements scala.Serializable
```
:: Evolving ::
Main class for programmatically interacting with Delta tables. You can create DeltaTable instances using the static methods.
```
   DeltaTable.forPath(sparkSession, pathToTheDeltaTable)
 
```
Since:

0.3.0

See Also:

Serialized Form

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`DeltaTable`	`alias(String alias)` :: Evolving ::
`DeltaTable`	`as(String alias)` :: Evolving ::
`static DeltaTable`	`convertToDelta(org.apache.spark.sql.SparkSession spark, String identifier)` :: Evolving ::
`static DeltaTable`	`convertToDelta(org.apache.spark.sql.SparkSession spark, String identifier, String partitionSchema)` :: Evolving ::
`static DeltaTable`	`convertToDelta(org.apache.spark.sql.SparkSession spark, String identifier, org.apache.spark.sql.types.StructType partitionSchema)` :: Evolving ::
`void`	`delete()` :: Evolving ::
`void`	`delete(org.apache.spark.sql.Column condition)` :: Evolving ::
`void`	`delete(String condition)` :: Evolving ::
`static DeltaTable`	`forName(org.apache.spark.sql.SparkSession sparkSession, String tableName)` Create a DeltaTable using the given table or view name using the given SparkSession.
`static DeltaTable`	`forName(String tableOrViewName)` Create a DeltaTable using the given table or view name using the given SparkSession.
`static DeltaTable`	`forPath(org.apache.spark.sql.SparkSession sparkSession, String path)` :: Evolving ::
`static DeltaTable`	`forPath(String path)` :: Evolving ::
`void`	`generate(String mode)` :: Evolving ::
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`history()` :: Evolving ::
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`history(int limit)` :: Evolving ::
`static boolean`	`isDeltaTable(org.apache.spark.sql.SparkSession sparkSession, String identifier)` :: Evolving ::
`static boolean`	`isDeltaTable(String identifier)` :: Evolving ::
`DeltaMergeBuilder`	`merge(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> source, org.apache.spark.sql.Column condition)` :: Evolving ::
`DeltaMergeBuilder`	`merge(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> source, String condition)` :: Evolving ::
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`toDF()` :: Evolving ::
`void`	`update(org.apache.spark.sql.Column condition, scala.collection.immutable.Map<String,org.apache.spark.sql.Column> set)` :: Evolving ::
`void`	`update(org.apache.spark.sql.Column condition, java.util.Map<String,org.apache.spark.sql.Column> set)` :: Evolving ::
`void`	`update(scala.collection.immutable.Map<String,org.apache.spark.sql.Column> set)` :: Evolving ::
`void`	`update(java.util.Map<String,org.apache.spark.sql.Column> set)` :: Evolving ::
`void`	`updateExpr(scala.collection.immutable.Map<String,String> set)` :: Evolving ::
`void`	`updateExpr(java.util.Map<String,String> set)` :: Evolving ::
`void`	`updateExpr(String condition, scala.collection.immutable.Map<String,String> set)` :: Evolving ::
`void`	`updateExpr(String condition, java.util.Map<String,String> set)` :: Evolving ::
`void`	`upgradeTableProtocol(int readerVersion, int writerVersion)` :: Evolving ::
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`vacuum()` :: Evolving ::
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`vacuum(double retentionHours)` :: Evolving ::

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Method Detail
  - convertToDelta
```
public static DeltaTable convertToDelta(org.apache.spark.sql.SparkSession spark,
                                        String identifier,
                                        org.apache.spark.sql.types.StructType partitionSchema)
```
    :: Evolving ::
    Create a DeltaTable from the given parquet table and partition schema. Takes an existing parquet table and constructs a delta transaction log in the base path of that table.
    Note: Any changes to the table during the conversion process may not result in a consistent state at the end of the conversion. Users should stop any changes to the table before the conversion is started.
    An example usage would be
```
  io.delta.tables.DeltaTable.convertToDelta(
   spark,
   "parquet.`/path`",
   new StructType().add(StructField("key1", LongType)).add(StructField("key2", StringType)))
 
```
    Parameters:
    
    spark - (undocumented)
    
    identifier - (undocumented)
    
    partitionSchema - (undocumented)
    
    Since:
    
    0.4.0
  - convertToDelta
```
public static DeltaTable convertToDelta(org.apache.spark.sql.SparkSession spark,
                                        String identifier,
                                        String partitionSchema)
```
    :: Evolving ::
    Create a DeltaTable from the given parquet table and partition schema. Takes an existing parquet table and constructs a delta transaction log in the base path of that table.
    Note: Any changes to the table during the conversion process may not result in a consistent state at the end of the conversion. Users should stop any changes to the table before the conversion is started.
    An example usage would be
```
  io.delta.tables.DeltaTable.convertToDelta(
   spark,
   "parquet.`/path`",
   "key1 long, key2 string")
 
```
    Parameters:
    
    spark - (undocumented)
    
    identifier - (undocumented)
    
    partitionSchema - (undocumented)
    
    Since:
    
    0.4.0
  - convertToDelta
```
public static DeltaTable convertToDelta(org.apache.spark.sql.SparkSession spark,
                                        String identifier)
```
    :: Evolving ::
    Create a DeltaTable from the given parquet table. Takes an existing parquet table and constructs a delta transaction log in the base path of the table.
    Note: Any changes to the table during the conversion process may not result in a consistent state at the end of the conversion. Users should stop any changes to the table before the conversion is started.
    An Example would be
```
  io.delta.tables.DeltaTable.convertToDelta(
   spark,
   "parquet.`/path`"
 
```
    Parameters:
    
    spark - (undocumented)
    
    identifier - (undocumented)
    
    Since:
    
    0.4.0
  - forPath
```
public static DeltaTable forPath(String path)
```
    :: Evolving ::
    Create a DeltaTable for the data at the given path.
    Note: This uses the active SparkSession in the current thread to read the table data. Hence, this throws error if active SparkSession has not been set, that is, SparkSession.getActiveSession() is empty.
    
    Parameters:
    
    path - (undocumented)
    
    Since:
    
    0.3.0
  - forPath
```
public static DeltaTable forPath(org.apache.spark.sql.SparkSession sparkSession,
                                 String path)
```
    :: Evolving ::
    Create a DeltaTable for the data at the given path using the given SparkSession.
    
    Parameters:
    
    sparkSession - (undocumented)
    
    path - (undocumented)
    
    Since:
    
    0.3.0
  - forName
```
public static DeltaTable forName(String tableOrViewName)
```
    Create a DeltaTable using the given table or view name using the given SparkSession.
    Note: This uses the active SparkSession in the current thread to read the table data. Hence, this throws error if active SparkSession has not been set, that is, SparkSession.getActiveSession() is empty.
    
    Parameters:
    
    tableOrViewName - (undocumented)
  - forName
```
public static DeltaTable forName(org.apache.spark.sql.SparkSession sparkSession,
                                 String tableName)
```
    Create a DeltaTable using the given table or view name using the given SparkSession.
    
    Parameters:
    
    sparkSession - (undocumented)
    
    tableName - (undocumented)
  - isDeltaTable
```
public static boolean isDeltaTable(org.apache.spark.sql.SparkSession sparkSession,
                                   String identifier)
```
    :: Evolving ::
    Check if the provided identifier string, in this case a file path, is the root of a Delta table using the given SparkSession.
    An example would be
```
   DeltaTable.isDeltaTable(spark, "path/to/table")
 
```
    Parameters:
    
    sparkSession - (undocumented)
    
    identifier - (undocumented)
    
    Since:
    
    0.4.0
  - isDeltaTable
```
public static boolean isDeltaTable(String identifier)
```
    :: Evolving ::
    Check if the provided identifier string, in this case a file path, is the root of a Delta table.
    Note: This uses the active SparkSession in the current thread to search for the table. Hence, this throws error if active SparkSession has not been set, that is, SparkSession.getActiveSession() is empty.
    An example would be
```
   DeltaTable.isDeltaTable(spark, "/path/to/table")
 
```
    Parameters:
    
    identifier - (undocumented)
    
    Since:
    
    0.4.0
  - as
```
public DeltaTable as(String alias)
```
    :: Evolving ::
    Apply an alias to the DeltaTable. This is similar to Dataset.as(alias) or SQL tableName AS alias.
    
    Parameters:
    
    alias - (undocumented)
    
    Since:
    
    0.3.0
  - alias
```
public DeltaTable alias(String alias)
```
    :: Evolving ::
    Apply an alias to the DeltaTable. This is similar to Dataset.as(alias) or SQL tableName AS alias.
    
    Parameters:
    
    alias - (undocumented)
    
    Since:
    
    0.3.0
  - toDF
```
public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> toDF()
```
    :: Evolving ::
    Get a DataFrame (that is, Dataset[Row]) representation of this Delta table.
    
    Since:
    
    0.3.0
  - vacuum
```
public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> vacuum(double retentionHours)
```
    :: Evolving ::
    Recursively delete files and directories in the table that are not needed by the table for maintaining older versions up to the given retention threshold. This method will return an empty DataFrame on successful completion.
    
    Parameters:
    
    retentionHours - The retention threshold in hours. Files required by the table for reading versions earlier than this will be preserved and the rest of them will be deleted.
    
    Since:
    
    0.3.0
  - vacuum
```
public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> vacuum()
```
    :: Evolving ::
    Recursively delete files and directories in the table that are not needed by the table for maintaining older versions up to the given retention threshold. This method will return an empty DataFrame on successful completion.
    note: This will use the default retention period of 7 days.
    
    Since:
    
    0.3.0
  - history
```
public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> history(int limit)
```
    :: Evolving ::
    Get the information of the latest limit commits on this table as a Spark DataFrame. The information is in reverse chronological order.
    
    Parameters:
    
    limit - The number of previous commands to get history for
    
    Since:
    
    0.3.0
  - history
```
public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> history()
```
    :: Evolving ::
    Get the information available commits on this table as a Spark DataFrame. The information is in reverse chronological order.
    
    Since:
    
    0.3.0
  - generate
```
public void generate(String mode)
```
    :: Evolving ::
    Generate a manifest for the given Delta Table
    
    Parameters:
    
    mode - Specifies the mode for the generation of the manifest. The valid modes are as follows (not case sensitive): - "symlink_format_manifest" : This will generate manifests in symlink format for Presto and Athena read support. See the online documentation for more information.
    
    Since:
    
    0.5.0
  - delete
```
public void delete(String condition)
```
    :: Evolving ::
    Delete data from the table that match the given condition.
    
    Parameters:
    
    condition - Boolean SQL expression
    
    Since:
    
    0.3.0
  - delete
```
public void delete(org.apache.spark.sql.Column condition)
```
    :: Evolving ::
    Delete data from the table that match the given condition.
    
    Parameters:
    
    condition - Boolean SQL expression
    
    Since:
    
    0.3.0
  - delete
```
public void delete()
```
    :: Evolving ::
    Delete data from the table.
    
    Since:
    
    0.3.0
  - update
```
public void update(scala.collection.immutable.Map<String,org.apache.spark.sql.Column> set)
```
    :: Evolving ::
    Update rows in the table based on the rules defined by set.
    Scala example to increment the column data.
```
    import org.apache.spark.sql.functions._

    deltaTable.update(Map("data" -> col("data") + 1))
 
```
    Parameters:
    
    set - rules to update a row as a Scala map between target column names and corresponding update expressions as Column objects.
    
    Since:
    
    0.3.0
  - update
```
public void update(java.util.Map<String,org.apache.spark.sql.Column> set)
```
    :: Evolving ::
    Update rows in the table based on the rules defined by set.
    Java example to increment the column data.
```
    import org.apache.spark.sql.Column;
    import org.apache.spark.sql.functions;

    deltaTable.update(
      new HashMap<String, Column>() {{
        put("data", functions.col("data").plus(1));
      }}
    );
 
```
    Parameters:
    
    set - rules to update a row as a Java map between target column names and corresponding update expressions as Column objects.
    
    Since:
    
    0.3.0
  - update
```
public void update(org.apache.spark.sql.Column condition,
                   scala.collection.immutable.Map<String,org.apache.spark.sql.Column> set)
```
    :: Evolving ::
    Update data from the table on the rows that match the given condition based on the rules defined by set.
    Scala example to increment the column data.
```
    import org.apache.spark.sql.functions._

    deltaTable.update(
      col("date") > "2018-01-01",
      Map("data" -> col("data") + 1))
 
```
    Parameters:
    
    condition - boolean expression as Column object specifying which rows to update.
    
    set - rules to update a row as a Scala map between target column names and corresponding update expressions as Column objects.
    
    Since:
    
    0.3.0
  - update
```
public void update(org.apache.spark.sql.Column condition,
                   java.util.Map<String,org.apache.spark.sql.Column> set)
```
    :: Evolving ::
    Update data from the table on the rows that match the given condition based on the rules defined by set.
    Java example to increment the column data.
```
    import org.apache.spark.sql.Column;
    import org.apache.spark.sql.functions;

    deltaTable.update(
      functions.col("date").gt("2018-01-01"),
      new HashMap<String, Column>() {{
        put("data", functions.col("data").plus(1));
      }}
    );
 
```
    Parameters:
    
    condition - boolean expression as Column object specifying which rows to update.
    
    set - rules to update a row as a Java map between target column names and corresponding update expressions as Column objects.
    
    Since:
    
    0.3.0
  - updateExpr
```
public void updateExpr(scala.collection.immutable.Map<String,String> set)
```
    :: Evolving ::
    Update rows in the table based on the rules defined by set.
    Scala example to increment the column data.
```
    deltaTable.updateExpr(Map("data" -> "data + 1")))
 
```
    Parameters:
    
    set - rules to update a row as a Scala map between target column names and corresponding update expressions as SQL formatted strings.
    
    Since:
    
    0.3.0
  - updateExpr
```
public void updateExpr(java.util.Map<String,String> set)
```
    :: Evolving ::
    Update rows in the table based on the rules defined by set.
    Java example to increment the column data.
```
    deltaTable.updateExpr(
      new HashMap<String, String>() {{
        put("data", "data + 1");
      }}
    );
 
```
    Parameters:
    
    set - rules to update a row as a Java map between target column names and corresponding update expressions as SQL formatted strings.
    
    Since:
    
    0.3.0
  - updateExpr
```
public void updateExpr(String condition,
                       scala.collection.immutable.Map<String,String> set)
```
    :: Evolving ::
    Update data from the table on the rows that match the given condition, which performs the rules defined by set.
    Scala example to increment the column data.
```
    deltaTable.update(
      "date > '2018-01-01'",
      Map("data" -> "data + 1"))
 
```
    Parameters:
    
    condition - boolean expression as SQL formatted string object specifying which rows to update.
    
    set - rules to update a row as a Scala map between target column names and corresponding update expressions as SQL formatted strings.
    
    Since:
    
    0.3.0
  - updateExpr
```
public void updateExpr(String condition,
                       java.util.Map<String,String> set)
```
    :: Evolving ::
    Update data from the table on the rows that match the given condition, which performs the rules defined by set.
    Java example to increment the column data.
```
    deltaTable.update(
      "date > '2018-01-01'",
      new HashMap<String, String>() {{
        put("data", "data + 1");
      }}
    );
 
```
    Parameters:
    
    condition - boolean expression as SQL formatted string object specifying which rows to update.
    
    set - rules to update a row as a Java map between target column names and corresponding update expressions as SQL formatted strings.
    
    Since:
    
    0.3.0
  - merge
```
public DeltaMergeBuilder merge(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> source,
                               String condition)
```
    :: Evolving ::
    Merge data from the source DataFrame based on the given merge condition. This returns a DeltaMergeBuilder object that can be used to specify the update, delete, or insert actions to be performed on rows based on whether the rows matched the condition or not.
    See the DeltaMergeBuilder for a full description of this operation and what combinations of update, delete and insert operations are allowed.
    Scala example to update a key-value Delta table with new key-values from a source DataFrame:
```
    deltaTable
     .as("target")
     .merge(
       source.as("source"),
       "target.key = source.key")
     .whenMatched
     .updateExpr(Map(
       "value" -> "source.value"))
     .whenNotMatched
     .insertExpr(Map(
       "key" -> "source.key",
       "value" -> "source.value"))
     .execute()
 
```
    Java example to update a key-value Delta table with new key-values from a source DataFrame:
```
    deltaTable
     .as("target")
     .merge(
       source.as("source"),
       "target.key = source.key")
     .whenMatched
     .updateExpr(
        new HashMap<String, String>() {{
          put("value" -> "source.value");
        }})
     .whenNotMatched
     .insertExpr(
        new HashMap<String, String>() {{
         put("key", "source.key");
         put("value", "source.value");
       }})
     .execute();
 
```
    Parameters:
    
    source - source Dataframe to be merged.
    
    condition - boolean expression as SQL formatted string
    
    Since:
    
    0.3.0
  - merge
```
public DeltaMergeBuilder merge(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> source,
                               org.apache.spark.sql.Column condition)
```
    :: Evolving ::
    Merge data from the source DataFrame based on the given merge condition. This returns a DeltaMergeBuilder object that can be used to specify the update, delete, or insert actions to be performed on rows based on whether the rows matched the condition or not.
    See the DeltaMergeBuilder for a full description of this operation and what combinations of update, delete and insert operations are allowed.
    Scala example to update a key-value Delta table with new key-values from a source DataFrame:
```
    deltaTable
     .as("target")
     .merge(
       source.as("source"),
       "target.key = source.key")
     .whenMatched
     .updateExpr(Map(
       "value" -> "source.value"))
     .whenNotMatched
     .insertExpr(Map(
       "key" -> "source.key",
       "value" -> "source.value"))
     .execute()
 
```
    Java example to update a key-value Delta table with new key-values from a source DataFrame:
```
    deltaTable
     .as("target")
     .merge(
       source.as("source"),
       "target.key = source.key")
     .whenMatched
     .updateExpr(
        new HashMap<String, String>() {{
          put("value" -> "source.value")
        }})
     .whenNotMatched
     .insertExpr(
        new HashMap<String, String>() {{
         put("key", "source.key");
         put("value", "source.value");
       }})
     .execute()
 
```
    Parameters:
    
    source - source Dataframe to be merged.
    
    condition - boolean expression as a Column object
    
    Since:
    
    0.3.0
  - upgradeTableProtocol
```
public void upgradeTableProtocol(int readerVersion,
                                 int writerVersion)
```
    :: Evolving ::
    Updates the protocol version of the table to leverage new features. Upgrading the reader version will prevent all clients that have an older version of Delta Lake from accessing this table. Upgrading the writer version will prevent older versions of Delta Lake to write to this table. The reader or writer version cannot be downgraded.
    See online documentation and Delta's protocol specification at PROTOCOL.md for more details.
    
    Parameters:
    
    readerVersion - (undocumented)
    
    writerVersion - (undocumented)
    
    Since:
    
    0.8.0

Class DeltaTable

Method Summary

Methods inherited from class Object

Method Detail

convertToDelta

convertToDelta

convertToDelta

forPath

forPath

forName

forName

isDeltaTable

isDeltaTable

as

alias

toDF

vacuum

vacuum

history

history

generate

delete

delete

delete

update

update

update

update

updateExpr

updateExpr

updateExpr

updateExpr

merge

merge

upgradeTableProtocol