[SPARK-57881][SQL] UnionExec outputPartitioning supports KeyedPartitioning by pan3793 · Pull Request #56961 · apache/spark

pan3793 · 2026-07-02T15:19:46Z

What changes were proposed in this pull request?

This PR extends UnionExec's output partitioning propagation to include KeyedPartitioning, so that a UNION ALL of KeyedPartitioning children can participate in a SPJ.

Why are the changes needed?

A UNION ALL of two V2 (i.e., Iceberg) bucketed tables currently reports UnknownPartitioning — the unionOutputPartitioning feature only handled HashPartitioning/SinglePartition. Consequently, any join on the partition key over such a union always shuffles, even when both legs are storage-partitioned on the join key. This defeats the SPJ optimization for a common pattern: combining sharded/bucketed sources before a join.

Example — t1, t2, t3 all partitioned by identity(id):

SELECT /*+ MERGE(u, t3) */ u.id, u.data, t3.data
FROM (
  SELECT id, data FROM t1
  UNION ALL
  SELECT id, data FROM t2
) u
JOIN t3 ON u.id = t3.id;

Before: the union reports UnknownPartitioning, so the sort-merge join inserts a shuffle on both sides.
After: the union reports a merged KeyedPartitioning over id; SPJ recognizes both legs as compatible and the join runs without a shuffle (a GroupPartitionsExec coalesces partitions sharing a key when the children's key sets overlap).

Does this PR introduce any user-facing change?

No, it's a perf-only change.

How was this patch tested?

UTs are added.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: GLM 5.2

…oning

[SPARK-57881][SQL] UnionExec outputPartitioning supports KeyedPartiti…

aaeb844

…oning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-57881][SQL] UnionExec outputPartitioning supports KeyedPartitioning#56961

[SPARK-57881][SQL] UnionExec outputPartitioning supports KeyedPartitioning#56961
pan3793 wants to merge 1 commit into
apache:masterfrom
pan3793:SPARK-57881

pan3793 commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

pan3793 commented Jul 2, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant