ParquetHandler (Delta Kernel 3.3.1

All Known Implementing Classes:

DefaultParquetHandler
```
@Evolving
public interface ParquetHandler
```
Provides Parquet file related functionalities to Delta Kernel. Connectors can leverage this interface to provide their own custom implementation of Parquet data file functionalities to Delta Kernel.

Since:

3.0.0

Method Summary

All Methods Instance Methods Abstract Methods
Modifier and Type	Method and Description
`CloseableIterator<ColumnarBatch>`	`readParquetFiles(CloseableIterator<FileStatus> fileIter, StructType physicalSchema, java.util.Optional<Predicate> predicate)` Read the Parquet format files at the given locations and return the data as a `ColumnarBatch` with the columns requested by `physicalSchema`.
`void`	`writeParquetFileAtomically(String filePath, CloseableIterator<FilteredColumnarBatch> data)` Write the given data as a Parquet file.
`CloseableIterator<DataFileStatus>`	`writeParquetFiles(String directoryPath, CloseableIterator<FilteredColumnarBatch> dataIter, java.util.List<Column> statsColumns)` Write the given data batches to a Parquet files.

- Method Detail
  - readParquetFiles
```
CloseableIterator<ColumnarBatch> readParquetFiles(CloseableIterator<FileStatus> fileIter,
                                                  StructType physicalSchema,
                                                  java.util.Optional<Predicate> predicate)
                                           throws java.io.IOException
```
    Read the Parquet format files at the given locations and return the data as a ColumnarBatch with the columns requested by physicalSchema.
    If physicalSchema has a StructField with column name StructField.METADATA_ROW_INDEX_COLUMN_NAME and the field is a metadata column StructField.isMetadataColumn() the column must be populated with the file row index.
    How does a column in physicalSchema match to the column in the Parquet file? If the StructField has a field id in the metadata with key `parquet.field.id` the column is attempted to match by id. If the column is not found by id, the column is matched by name. When trying to find the column in Parquet by name, first case-sensitive match is used. If not found then a case-insensitive match is attempted.
    
    Parameters:
    
    fileIter - Iterator of files to read data from.
    
    physicalSchema - Select list of columns to read from the Parquet file.
    
    predicate - Optional predicate which the Parquet reader can optionally use to prune rows that don't satisfy the predicate. Because pruning is optional and may be incomplete, caller is still responsible apply the predicate on the data returned by this method.
    
    Returns:
    
    an iterator of ColumnarBatchs containing the data in columnar format. It is the responsibility of the caller to close the iterator. The data returned is in the same as the order of files given in scanFileIter.
    
    Throws:
    
    java.io.IOException - if an I/O error occurs during the read.
  - writeParquetFiles
```
CloseableIterator<DataFileStatus> writeParquetFiles(String directoryPath,
                                                    CloseableIterator<FilteredColumnarBatch> dataIter,
                                                    java.util.List<Column> statsColumns)
                                             throws java.io.IOException
```
    Write the given data batches to a Parquet files. Try to keep the Parquet file size to given size. If the current file exceeds this size close the current file and start writing to a new file.
    
    Parameters:
    
    directoryPath - Location where the data files should be written.
    
    dataIter - Iterator of data batches to write. It is the responsibility of the calle to close the iterator.
    
    statsColumns - List of columns to collect statistics for. The statistics collection is optional. If the implementation does not support statistics collection, it is ok to return no statistics.
    
    Returns:
    
    an iterator of DataFileStatus containing the status of the written files. Each status contains the file path and the optionally collected statistics for the file It is the responsibility of the caller to close the iterator.
    
    Throws:
    
    java.io.IOException - if an I/O error occurs during the file writing. This may leave some files already written in the directory. It is the responsibility of the caller to clean up.
    
    Since:
    
    3.2.0
  - writeParquetFileAtomically
```
void writeParquetFileAtomically(String filePath,
                                CloseableIterator<FilteredColumnarBatch> data)
                         throws java.io.IOException
```
    Write the given data as a Parquet file. This call either succeeds in creating the file with given contents or no file is created at all. It won't leave behind a partially written file.
    
    Parameters:
    
    filePath - Fully qualified destination file path
    
    data - Iterator of FilteredColumnarBatch
    
    Throws:
    
    java.nio.file.FileAlreadyExistsException - if the file already exists and overwrite is false.
    
    java.io.IOException - if any other I/O error occurs.

Interface ParquetHandler

Method Summary

Method Detail

readParquetFiles

writeParquetFiles

writeParquetFileAtomically