Skip to content

[SPARK-57881][SQL] UnionExec outputPartitioning supports KeyedPartitioning#56961

Open
pan3793 wants to merge 1 commit into
apache:masterfrom
pan3793:SPARK-57881
Open

[SPARK-57881][SQL] UnionExec outputPartitioning supports KeyedPartitioning#56961
pan3793 wants to merge 1 commit into
apache:masterfrom
pan3793:SPARK-57881

Conversation

@pan3793

@pan3793 pan3793 commented Jul 2, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

This PR extends UnionExec's output partitioning propagation to include KeyedPartitioning, so that a UNION ALL of KeyedPartitioning children can participate in a SPJ.

Why are the changes needed?

A UNION ALL of two V2 (i.e., Iceberg) bucketed tables currently reports UnknownPartitioning — the unionOutputPartitioning feature only handled HashPartitioning/SinglePartition. Consequently, any join on the partition key over such a union always shuffles, even when both legs are storage-partitioned on the join key. This defeats the SPJ optimization for a common pattern: combining sharded/bucketed sources before a join.

Example — t1, t2, t3 all partitioned by identity(id):

SELECT /*+ MERGE(u, t3) */ u.id, u.data, t3.data
FROM (
  SELECT id, data FROM t1
  UNION ALL
  SELECT id, data FROM t2
) u
JOIN t3 ON u.id = t3.id;
  • Before: the union reports UnknownPartitioning, so the sort-merge join inserts a shuffle on both sides.
  • After: the union reports a merged KeyedPartitioning over id; SPJ recognizes both legs as compatible and the join runs without a shuffle (a GroupPartitionsExec coalesces partitions sharing a key when the children's key sets overlap).

Does this PR introduce any user-facing change?

No, it's a perf-only change.

How was this patch tested?

UTs are added.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: GLM 5.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant