Skip to content

[WIP][SPARK-55444][SQL] Route TimeType Parquet filter pushdown through the Types Framework#56965

Open
stevomitric wants to merge 1 commit into
apache:masterfrom
stevomitric:stevomitric/parquet-tf-filter-pushdown-fw
Open

[WIP][SPARK-55444][SQL] Route TimeType Parquet filter pushdown through the Types Framework#56965
stevomitric wants to merge 1 commit into
apache:masterfrom
stevomitric:stevomitric/parquet-tf-filter-pushdown-fw

Conversation

@stevomitric

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Routes TimeType's Parquet predicate pushdown through the Types Framework instead of the inline ParquetTimeMicrosType handling — the last Parquet integration point still hardcoded in ParquetFilters (schema/write/row-read/vectorized-read already moved to ParquetTypeOps).

  • New ParquetFilterOps trait: the Parquet encoding a framework type owns (primitive + logical annotation), value acceptance, and the 7 predicate builders (eq/notEq/lt/ltEq/gt/gtEq/in).
  • TimeTypeParquetOps.filterOps (LocalTime → micros-of-day Long), exposed via ParquetTypeOps.parquetFilterOps + the reverse filterOpsFor lookup.
  • Replace the scattered ParquetTimeMicrosType arms in ParquetFilters (7 make* + valueCanMakeFilterOn) with a FrameworkFilterOps extractor.

Dispatch is keyed on the Parquet file's on-disk encoding (reverse lookup), not the Spark type, because filter pushdown binds predicates to physical columns and the value converter depends on the physical unit — matching the existing physical-schema dispatch.

Why are the changes needed?

Framework types get filter pushdown with no per-type changes to ParquetFilters, keeping the per-type filter knowledge with the type. No ParquetFilters constructor change.

Does this PR introduce any user-facing change?

No. The extractor matches only the canonical INT64 TIME(MICROS, isAdjustedToUTC=false) encoding that ParquetTimeMicrosType matched; behavior is identical. NANOS pushdown remains unsupported.

How was this patch tested?

TimeTypeParquetOpsSuite (+4 filter unit tests); ParquetV1FilterSuite/ParquetV2FilterSuite "SPARK-51687: filter pushdown - time" pass unchanged.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

…hrough the Types Framework

Follow-up to the Types Framework Phase 3a Parquet work (SPARK-55444). Routes TimeType's Parquet
predicate pushdown through the framework instead of the inline ParquetTimeMicrosType handling, so
framework types get filter pushdown without per-type changes to ParquetFilters. TimeType was the
last Parquet integration point still hardcoded in ParquetFilters (schema/write/row-read/vectorized
read already moved to ParquetTypeOps).

- Add a ParquetFilterOps trait carrying a framework type's filter logic: the Parquet encoding it
  owns (primitive + logical annotation), value acceptance, and the seven predicate builders
  (eq/notEq/lt/ltEq/gt/gtEq/in).
- Implement TimeTypeParquetOps.filterOps (LocalTime -> micros-of-day Long), identical to the
  previous inline conversion, exposed via ParquetTypeOps.parquetFilterOps and the
  ParquetTypeOps.filterOpsFor reverse lookup.
- Replace the scattered ParquetTimeMicrosType arms in ParquetFilters' seven make* PartialFunctions
  and valueCanMakeFilterOn with a FrameworkFilterOps extractor (defined in ParquetFilters because
  ParquetSchemaType is private).

Dispatch is keyed on the Parquet file's on-disk encoding (a reverse lookup), not the requested
Spark type, because filter pushdown binds predicates to physical columns and the value converter
depends on the physical unit -- a forward DataType-keyed dispatch could build a micros-scaled
predicate for a TIME(NANOS) file read as TimeType(6). This mirrors the existing physical-schema
dispatch and keeps behavior byte-identical: the extractor matches only the canonical
INT64 TIME(MICROS, isAdjustedToUTC=false) encoding that ParquetTimeMicrosType matched. NANOS
pushdown remains unsupported (unchanged). Timestamp/Decimal stay physical-keyed (not framework
types).

Tested: TimeTypeParquetOpsSuite (+4 filter unit tests); ParquetV1FilterSuite and ParquetV2FilterSuite
"SPARK-51687: filter pushdown - time" pass unchanged.

Generated-by: Claude Code (Claude Opus 4.8)

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant