Table Utility Commands
Delta Lake tables support vacuum
and history
utility commands.
Vacuum
You can remove files that are no longer referenced by a Delta Lake table and are older than the retention threshold by running vacuum
on the table. The default retention threshold for the files is 7 days.
The ability to time travel back to a version older than the retention period is lost after running vacuum
. vacuum
is not triggered automatically. Running the vacuum
command on the table recursively vacuums the directories associated with the Delta Lake table.
- Scala
import io.delta.tables._ val deltaTable = DeltaTable.forPath(spark, pathToTable) deltaTable.vacuum() // vacuum files not required by versions older than the default retention period deltaTable.vacuum(100) // vacuum files not required by versions more than 100 hours old
- Java
import io.delta.tables.*; import org.apache.spark.sql.functions; DeltaTable deltaTable = DeltaTable.forPath(spark, pathToTable); deltaTable.vacuum(); // vacuum files not required by versions older than the default retention period deltaTable.vacuum(100); // vacuum files not required by versions more than 100 hours old
See Programmatic API Docs for more details.
History
You can retrieve information on the operations, user, timestamp, and so on for each write to a Delta Lake table by running the history
command. The operations are returned in reverse chronological order. By default table history is retained for 30 days.
- Scala
import io.delta.tables._ val deltaTable = DeltaTable.forPath(spark, pathToTable) val fullHistoryDF = deltaTable.history() // get the full history of the table. val lastOperationDF = deltaTable.history(1) // get the last operation.
- Java
import io.delta.tables.*; DeltaTable deltaTable = DeltaTable.forPath(spark, pathToTable); DataFrame fullHistoryDF = deltaTable.history(); // get the full history of the table. DataFrame lastOperationDF = deltaTable.history(1); // fetch the last operation on the DeltaTable.
The returned DataFrame will have the following structure.
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+ |version| timestamp|userId|userName|operation| operationParameters| job|notebook|clusterId|readVersion|isolationLevel|isBlindAppend| +-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+ | 5|2019-07-29 14:07:47| null| null| DELETE|[predicate -> ["(...|null| null| null| 4| null| false| | 4|2019-07-29 14:07:41| null| null| UPDATE|[predicate -> (id...|null| null| null| 3| null| false| | 3|2019-07-29 14:07:29| null| null| DELETE|[predicate -> ["(...|null| null| null| 2| null| false| | 2|2019-07-29 14:06:56| null| null| UPDATE|[predicate -> (id...|null| null| null| 1| null| false| | 1|2019-07-29 14:04:31| null| null| DELETE|[predicate -> ["(...|null| null| null| 0| null| false| | 0|2019-07-29 14:01:40| null| null| WRITE|[mode -> ErrorIfE...|null| null| null| null| null| true| +-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+
See Programmatic API Docs for more details.