Object

io.delta.kernel.defaults.engine.DefaultJsonHandler

All Implemented Interfaces:: JsonHandler

public class DefaultJsonHandler extends Object implements JsonHandler

Default implementation of JsonHandler based on Hadoop APIs.

Constructor Summary

Constructors

Constructor

Description

DefaultJsonHandler(FileIO fileIO)
Method Summary

Modifier and Type

Method

Description

ColumnarBatch

parseJson(ColumnVector jsonStringVector, StructType outputSchema, Optional<ColumnVector> selectionVector)

Parse the given json strings and return the fields requested by outputSchema as columns in a ColumnarBatch.

CloseableIterator<ColumnarBatch>

readJsonFiles(CloseableIterator<FileStatus> scanFileIter, StructType physicalSchema, Optional<Predicate> predicate)

Read and parse the JSON format file at given locations and return the data as a ColumnarBatch with the columns requested by physicalSchema.

void

writeJsonFileAtomically(String filePath, CloseableIterator<Row> data, boolean overwrite)

Makes use of LogStore implementations in `delta-storage` to atomically write the data to a file depending upon the destination filesystem.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- DefaultJsonHandler
  
  public DefaultJsonHandler(FileIO fileIO)
Method Details
- parseJson
  
  public ColumnarBatch parseJson(ColumnVector jsonStringVector, StructType outputSchema, Optional<ColumnVector> selectionVector)
  
  Description copied from interface: JsonHandler
  Parse the given json strings and return the fields requested by outputSchema as columns in a ColumnarBatch.
  There are a couple special cases that should be handled for specific data types:
  
  FloatType and DoubleType: handle non-numeric numbers encoded as strings
  
  NaN: "NaN"
  Positive infinity: "+INF", "Infinity", "+Infinity"
  Negative infinity: "-INF", "-Infinity""
  
  DateType: handle dates encoded as strings in the format "yyyy-MM-dd"
  TimestampType: handle timestamps encoded as strings in the format "yyyy-MM-dd'T'HH:mm:ss.SSSXXX"
  Specified by:
  
  parseJson in interface JsonHandler
  
  Parameters:
  
  jsonStringVector - String ColumnVector of valid JSON strings.
  
  outputSchema - Schema of the data to return from the parsed JSON. If any requested fields are missing in the JSON string, a null is returned for that particular field in the returned Row. The type for each given field is expected to match the type in the JSON.
  
  selectionVector - Optional selection vector indicating which rows to parse the JSON. If present, only the selected rows should be parsed. Unselected rows should be all null in the returned batch.
  
  Returns:
  
  a ColumnarBatch of schema outputSchema with one row for each entry in jsonStringVector
- readJsonFiles
  
  public CloseableIterator<ColumnarBatch> readJsonFiles(CloseableIterator<FileStatus> scanFileIter, StructType physicalSchema, Optional<Predicate> predicate) throws IOException
  
  Description copied from interface: JsonHandler
  
  Read and parse the JSON format file at given locations and return the data as a ColumnarBatch with the columns requested by physicalSchema.
  
  Specified by:
  
  readJsonFiles in interface JsonHandler
  
  Parameters:
  
  scanFileIter - Iterator of files to read data from.
  
  physicalSchema - Select list of columns to read from the JSON file.
  
  predicate - Optional predicate which the JSON reader can optionally use to prune rows that don't satisfy the predicate. Because pruning is optional and may be incomplete, caller is still responsible apply the predicate on the data returned by this method.
  
  Returns:
  
  an iterator of ColumnarBatchs containing the data in columnar format. It is the responsibility of the caller to close the iterator. The data returned is in the same as the order of files given in scanFileIter
  
  Throws:
  
  IOException - if an I/O error occurs during the read.
- writeJsonFileAtomically
  
  public void writeJsonFileAtomically(String filePath, CloseableIterator<Row> data, boolean overwrite) throws IOException
  
  Makes use of LogStore implementations in `delta-storage` to atomically write the data to a file depending upon the destination filesystem.
  
  Specified by:
  
  writeJsonFileAtomically in interface JsonHandler
  
  Parameters:
  
  filePath - Destination file path
  
  data - Data to write as Json
  
  overwrite - If true, the file is overwritten if it already exists. If false and a file exists FileAlreadyExistsException is thrown.
  
  Throws:
  
  IOException
  
  FileAlreadyExistsException - if the file already exists and overwrite is false.

Class DefaultJsonHandler

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

DefaultJsonHandler

Method Details

parseJson

readJsonFiles

writeJsonFileAtomically