[WIP][SPARK-55444][SQL] Route TimeType Parquet filter pushdown through the Types Framework#56965
Open
stevomitric wants to merge 1 commit into
Open
Conversation
…hrough the Types Framework Follow-up to the Types Framework Phase 3a Parquet work (SPARK-55444). Routes TimeType's Parquet predicate pushdown through the framework instead of the inline ParquetTimeMicrosType handling, so framework types get filter pushdown without per-type changes to ParquetFilters. TimeType was the last Parquet integration point still hardcoded in ParquetFilters (schema/write/row-read/vectorized read already moved to ParquetTypeOps). - Add a ParquetFilterOps trait carrying a framework type's filter logic: the Parquet encoding it owns (primitive + logical annotation), value acceptance, and the seven predicate builders (eq/notEq/lt/ltEq/gt/gtEq/in). - Implement TimeTypeParquetOps.filterOps (LocalTime -> micros-of-day Long), identical to the previous inline conversion, exposed via ParquetTypeOps.parquetFilterOps and the ParquetTypeOps.filterOpsFor reverse lookup. - Replace the scattered ParquetTimeMicrosType arms in ParquetFilters' seven make* PartialFunctions and valueCanMakeFilterOn with a FrameworkFilterOps extractor (defined in ParquetFilters because ParquetSchemaType is private). Dispatch is keyed on the Parquet file's on-disk encoding (a reverse lookup), not the requested Spark type, because filter pushdown binds predicates to physical columns and the value converter depends on the physical unit -- a forward DataType-keyed dispatch could build a micros-scaled predicate for a TIME(NANOS) file read as TimeType(6). This mirrors the existing physical-schema dispatch and keeps behavior byte-identical: the extractor matches only the canonical INT64 TIME(MICROS, isAdjustedToUTC=false) encoding that ParquetTimeMicrosType matched. NANOS pushdown remains unsupported (unchanged). Timestamp/Decimal stay physical-keyed (not framework types). Tested: TimeTypeParquetOpsSuite (+4 filter unit tests); ParquetV1FilterSuite and ParquetV2FilterSuite "SPARK-51687: filter pushdown - time" pass unchanged. Generated-by: Claude Code (Claude Opus 4.8) Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Routes TimeType's Parquet predicate pushdown through the Types Framework instead of the inline
ParquetTimeMicrosTypehandling — the last Parquet integration point still hardcoded inParquetFilters(schema/write/row-read/vectorized-read already moved toParquetTypeOps).ParquetFilterOpstrait: the Parquet encoding a framework type owns (primitive + logical annotation), value acceptance, and the 7 predicate builders (eq/notEq/lt/ltEq/gt/gtEq/in).TimeTypeParquetOps.filterOps(LocalTime → micros-of-day Long), exposed viaParquetTypeOps.parquetFilterOps+ the reversefilterOpsForlookup.ParquetTimeMicrosTypearms inParquetFilters(7make*+valueCanMakeFilterOn) with aFrameworkFilterOpsextractor.Dispatch is keyed on the Parquet file's on-disk encoding (reverse lookup), not the Spark type, because filter pushdown binds predicates to physical columns and the value converter depends on the physical unit — matching the existing physical-schema dispatch.
Why are the changes needed?
Framework types get filter pushdown with no per-type changes to
ParquetFilters, keeping the per-type filter knowledge with the type. NoParquetFiltersconstructor change.Does this PR introduce any user-facing change?
No. The extractor matches only the canonical INT64 TIME(MICROS, isAdjustedToUTC=false) encoding that
ParquetTimeMicrosTypematched; behavior is identical. NANOS pushdown remains unsupported.How was this patch tested?
TimeTypeParquetOpsSuite(+4 filter unit tests);ParquetV1FilterSuite/ParquetV2FilterSuite"SPARK-51687: filter pushdown - time" pass unchanged.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.8)