PlannerTPC-H — TPCH-Q02

Engine Compare

Accuracy chart, rows processed ?
Scan
Scan
Join Probe
Join
Sort
Sort
Hash Build
Hash
Aggregate
Agg
Distribute
Dist
ClickHouse
Estimation Error
Est Err
Row Operations
Ops
1,620,854
1.6M
Rank
Estimation Error
Est Err
Row Operations
Ops
3,210,110
3.2M
Rank
Estimation Error
Est Err
Row Operations
Ops
644
644
Rank
Estimation Error
Est Err
Row Operations
Ops
1,897,001
1.9M
Rank
Estimation Error
Est Err
Row Operations
Ops
161,166
161K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
DuckDB
Estimation Error
Est Err
Row Operations
Ops
1,608,711
1.6M
Rank
Estimation Error
Est Err
Row Operations
Ops
1,614,263
1.6M
Rank
Estimation Error
Est Err
Row Operations
Ops
468
468
Rank
Estimation Error
Est Err
Row Operations
Ops
14,319
14K
Rank
Estimation Error
Est Err
Row Operations
Ops
644
644
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
PostgreSQL
Estimation Error
Est Err
Row Operations
Ops
815,923
816K
Rank
Estimation Error
Est Err
Row Operations
Ops
826,028
826K
Rank
Estimation Error
Est Err
Row Operations
Ops
468
468
Rank
Estimation Error
Est Err
Row Operations
Ops
177,338
177K
Rank
Estimation Error
Est Err
Row Operations
Ops
1,267
1.3K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
SQL Server
Estimation Error
Est Err
Row Operations
Ops
810,825
811K
Rank
Estimation Error
Est Err
Row Operations
Ops
809,588
810K
Rank
Estimation Error
Est Err
Row Operations
Ops
4,308
4.3K
Rank
Estimation Error
Est Err
Row Operations
Ops
17,036
17K
Rank
Estimation Error
Est Err
Row Operations
Ops
644
644
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank

Actual Query Plans

Query Plan per Engine ?
Query Plan
ClickHouse
Estimate    Actual  Operator
       -       644  SORT s_acctbal, n_name, s_name, p_partkey
       -       644  FILTER (p_size = 25) AND (p_type LIKE '%BRASS') AND (r_name = 'EUROPE') AND (ps_supplycost = )
       -       644  LEFT OUTER JOIN HASH ON p_partkey = p_partkey
       -    118933  │└PROJECT MIN(ps_supplycost)
       -    118933   GROUP BY HASH p_partkey AGGREGATE MIN(ps_supplycost)
       -    161166   PROJECT ps_partkey
       -    161166   INNER JOIN HASH ON n_regionkey = r_regionkey
       -         1   │└TABLE SCAN region WHERE r_name = 'EUROPE'
       -    799921   INNER JOIN HASH ON s_nationkey = n_nationkey
       -        25   │└TABLE SCAN nation
       -    799921   INNER JOIN HASH ON ps_suppkey = s_suppkey
       -     10000   │└TABLE SCAN supplier
       -    800000   TABLE SCAN partsupp WHERE ps_partkey = ps_partkey
       -       644  INNER JOIN HASH ON n_regionkey = r_regionkey
       -         1  │└TABLE SCAN region WHERE r_name = 'EUROPE'
       -      3208  INNER JOIN HASH ON s_nationkey = n_nationkey
       -        25  │└TABLE SCAN nation
       -      3208  INNER JOIN HASH ON ps_suppkey = s_suppkey
       -     10000  │└TABLE SCAN supplier
       -      3208  INNER JOIN HASH ON ps_partkey = p_partkey
       -       802  │└TABLE SCAN part WHERE (p_size = 25) AND (p_type LIKE '%BRASS')
       -    800000  TABLE SCAN partsupp
DuckDB
Estimate    Actual  Operator
       -       468  SORT s_acctbal, n_name, s_name, p_partkey
    3984       468  PROJECT s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment
    3984       468  FILTER ps_supplycost = SUBQUERY
       0         0  RIGHT SEMI INNER JOIN HASH ON p_partkey IS NOT DISTINCT FROM p_partkey
    3886       468  │└GROUP BY HASH #0 AGGREGATE 
       0       644   RIGHT OUTER JOIN HASH ON p_partkey IS NOT DISTINCT FROM p_partkey
       -         0   │└SCAN EMPTY
    4366       468   PROJECT min_cost, p_partkey
    4366       468   GROUP BY HASH #0 AGGREGATE MIN(#1)
    8732       644   PROJECT p_partkey, ps_supplycost
    8732       644   INNER JOIN HASH ON ps_suppkey = s_suppkey
    1923      2033   │└INNER JOIN HASH ON s_nationkey = n_nationkey
       5         5    │└INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1     │└TABLE SCAN region WHERE r_name = 'EUROPE'
      25         5     TABLE SCAN nation
   10000      7192    TABLE SCAN supplier
   39026      1871   INNER JOIN HASH ON ps_partkey = p_partkey
    3886         0   │└SCAN MATERIALISED
  800000    796758   TABLE SCAN partsupp WHERE ps_suppkey >= 1
    3984       644  INNER JOIN HASH ON ps_suppkey = s_suppkey
    1923      2033  │└INNER JOIN HASH ON s_nationkey = n_nationkey
       5         5   │└INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1    │└TABLE SCAN region WHERE r_name = 'EUROPE'
      25         5    TABLE SCAN nation
   10000      7192   TABLE SCAN supplier
   17806      3194  INNER JOIN HASH ON ps_partkey = p_partkey
    4445       799  │└TABLE SCAN part WHERE p_size = 25 AND suffix(p_type,'BRASS')
  800000    796758  TABLE SCAN partsupp WHERE ps_suppkey >= 1
PostgreSQL
Estimate    Actual  Operator
       1       468  SORT s_acctbal, n_name, s_name, p_partkey
       1       468  INNER JOIN HASH ON (p_partkey = ps_partkey) AND ((SubPlan 1) = ps_supplycost)
    1267      1267  │└GROUP BY SIMPLE AGGREGATE MIN(ps_supplycost)
    1267      1267   INNER JOIN LOOP ON r_regionkey = n_regionkey AND (r_regionkey = n_regionkey)
       1         1   │└TABLE SCAN r_min WHERE r_name = 'EUROPE'
    5068      5068   INNER JOIN LOOP ON n_nationkey = s_nationkey AND (n_nationkey = s_nationkey)
      25        25   │└TABLE SCAN n_min
    5068      5068   INNER JOIN LOOP ON s_suppkey = ps_suppkey
       4         4   │└TABLE SCAN ps_min WHERE ps_partkey = p_partkey
    5068      5068   TABLE SEEK s_min
  159984    162630   INNER JOIN HASH ON ps_suppkey = s_suppkey
    2000      2033   │└INNER JOIN HASH ON s_nationkey = n_nationkey
       5         5    │└INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1     │└TABLE SCAN region WHERE r_name = 'EUROPE'
      25        25     TABLE SCAN nation
   10000     10000    TABLE SCAN supplier
  800000    800000   TABLE SCAN partsupp
     817       799  TABLE SCAN part WHERE (p_type LIKE '%BRASS') AND (p_size = 25)
SQL Server
Estimate    Actual  Operator
     206       468  SORT s_acctbal, n_name, s_name, ps_partkey
    1030       468  LIMIT 0
    1396       644  GROUP BY HASH  AGGREGATE 
    1396       644  SORT ps_partkey, ps_supplycost
    1396       644  INNER JOIN HASH ON r_regionkey = n_regionkey
       1         1  │└TABLE SEEK region WHERE r_name = 'EUROPE'
    6983      3196  INNER JOIN HASH ON n_nationkey = s_nationkey
      25        25  │└TABLE SEEK nation
    6983      3196  INNER JOIN MERGE ON s_suppkey = ps_suppkey
   10000     10000  │└TABLE SEEK supplier
    6984      3196  SORT ps_suppkey
    6984      3196  INNER JOIN HASH ON ps_partkey = p_partkey
    1746       799  │└TABLE SEEK part WHERE p_size = 25 AND p_type LIKE '%BRASS'
  800000    800000  TABLE SCAN partsupp WHERE ps_partkey IS NOT NULL AND ps_supplycost IS NOT NULL

Commentary



The correlated Subquery

The fascinating bit about this query is the nested subquery: SELECT MIN(ps_supplycost) FROM tpch.partsupp....

That subquery is again correlated with the outer query via this filter: ps_partkey = p_partkey. But this time, the correlation is more complex: we must execute several joins to find the value that the outer ps_supplycost is looking for.

Consider this expression:

ps_supplycost = (SELECT MIN(ps_supplycost)
                       FROM tpch.partsupp AS ps_min
                       INNER JOIN tpch.supplier AS s_min
                           ON ps_suppkey = s_suppkey
                       INNER JOIN tpch.nation AS n_min
                           ON s_nationkey = n_nationkey
                       INNER JOIN region AS r_min
                           ON n_regionkey = r_regionkey
                       WHERE ps_partkey = p_partkey
                         AND r_name = 'EUROPE')

The key here is decorrelate this query into a join to this decorrelated construct

SELECT ps_partkey, 
       MIN(ps_supplycost) AS min_ps_supplycost
FROM tpch.partsupp AS ps_min
INNER JOIN tpch.supplier AS s_min
    ON ps_suppkey = s_suppkey
INNER JOIN tpch.nation AS n_min
    ON s_nationkey = n_nationkey
INNER JOIN region AS r_min
   ON n_regionkey = r_regionkey
WHERE ps_partkey = p_partkey
    AND r_name = 'EUROPE'
GROUP BY ps_partkey;

This allows us to harvest the filter on r_name and apply it transitively to the join in the outer query.