PlannerTPC-H — TPCH-Q02

Engine Compare

Accuracy chart, rows processed ?
Scan
Scan
Join Probe
Join
Sort
Sort
Hash Build
Hash
Aggregate
Agg
Distribute
Dist
DuckDB
1,197,599 rows
1.2M
1,201,489 rows
1.2M
346 rows
346
12,706 rows
13K
473 rows
473
0 rows
0
PostgreSQL
614,602 rows
615K
621,442 rows
621K
346 rows
346
133,172 rows
133K
1,145 rows
1.1K
0 rows
0
SQL Server
610,825 rows
611K
614,728 rows
615K
819 rows
819
7,565 rows
7.6K
473 rows
473
0 rows
0

Actual Query Plans

Query Plan per Engine ?
Query Plan
PostgreSQL
Estimate    Actual  Operator
       1       346  SORT s_acctbal, n_name, s_name, p_partkey
       1       346  INNER JOIN HASH ON (p_partkey = ps_partkey) AND ((SubPlan 1) = ps_supplycost)
    1145      1145  │└GROUP BY SIMPLE AGGREGATE MIN(ps_supplycost)
    1145      1145   INNER JOIN LOOP ON r_regionkey = n_regionkey
       1         1   │└TABLE SCAN r_min WHERE r_name = 'EUROPE'
    4580      3435   INNER JOIN LOOP ON n_nationkey = s_nationkey
      25        25   │└TABLE SCAN n_min
    4580      3435   INNER JOIN LOOP ON s_suppkey = ps_suppkey
       4         3   │└TABLE SCAN ps_min WHERE ps_partkey = p_partkey
    3748      3748   TABLE SEEK s_min
  119940    121974   INNER JOIN HASH ON ps_suppkey = s_suppkey
    2000      2033   │└INNER JOIN HASH ON s_nationkey = n_nationkey
       5         5    │└INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1     │└TABLE SCAN region WHERE r_name = 'EUROPE'
      25        25     TABLE SCAN nation
   10000     10000    TABLE SCAN supplier
  600000    600000   TABLE SCAN partsupp
     817       799  TABLE SCAN part WHERE (p_type LIKE'%BRASS') AND (p_size = 25)
DuckDB
Estimate    Actual  Operator
    2981       346  SORT s_acctbal, n_name, s_name, p_partkey
    2981       346  PROJECT s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment
    2981       346  FILTER ps_supplycost = SUBQUERY
       0         0  RIGHT SEMI INNER JOIN HASH ON p_partkey IS NOT DISTINCT FROM p_partkey
    2910       346  │└GROUP BY HASH #0 AGGREGATE 
       0       473   RIGHT OUTER JOIN HASH ON p_partkey IS NOT DISTINCT FROM p_partkey
    2910         0   │└SCAN EMPTY
    3151       346   PROJECT MIN(ps_supplycost), p_partkey
    3151       346   GROUP BY HASH #0 AGGREGATE MIN(#1)
    6302       473   PROJECT p_partkey, ps_supplycost
    6302       473   INNER JOIN HASH ON ps_suppkey = s_suppkey
    1923      2033   │└INNER JOIN HASH ON s_nationkey = n_nationkey
       5         5    │└INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1     │└TABLE SCAN region WHERE r_name = 'EUROPE'
      25         5     TABLE SCAN nation
   10000      7192    TABLE SCAN supplier
   28228      1383   INNER JOIN HASH ON ps_partkey = p_partkey
    2910         0   │└SCAN MATERIALISED
  600000    591202   TABLE SCAN partsupp WHERE ps_suppkey >= 1
    2981       473  INNER JOIN HASH ON ps_suppkey = s_suppkey
    1923      2033  │└INNER JOIN HASH ON s_nationkey = n_nationkey
       5         5   │└INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1    │└TABLE SCAN region WHERE r_name = 'EUROPE'
      25         5    TABLE SCAN nation
   10000      7192   TABLE SCAN supplier
   13354      2362  INNER JOIN HASH ON ps_partkey = p_partkey
    4445       591  │└FILTER p_partkey <= 150000
    4445       799   TABLE SCAN part WHERE p_size = 25 AND suffix(p_type,'BRASS')
  600000    591202  TABLE SCAN partsupp WHERE ps_suppkey >= 1
SQL Server
Estimate    Actual  Operator
     231       346  SORT s_acctbal, n_name, s_name, ps_partkey
     751       346  LIMIT 0
    1694       473  GROUP BY HASH  AGGREGATE 
    1694       473  SORT ps_partkey, ps_supplycost
    1694       473  INNER JOIN HASH ON r_regionkey = n_regionkey
       1         1  │└TABLE SEEK region WHERE r_name = 'EUROPE'
    8469      2364  INNER JOIN HASH ON n_nationkey = s_nationkey
      25        25  │└TABLE SEEK nation
    8469      2364  INNER JOIN HASH ON s_suppkey = ps_suppkey
    8470      2364  │└INNER JOIN HASH ON ps_partkey = p_partkey
    1828       799   │└TABLE SEEK part WHERE p_size = 25 AND p_type LIKE '%BRASS'
  600000    600000   TABLE SCAN partsupp
   10000     10000  TABLE SEEK supplier

Commentary



The correlated Subquery

The fascinating bit about this query is the nested subquery: SELECT MIN(ps_supplycost) FROM tpch.partsupp....

That subquery is again correlated with the outer query via this filter: ps_partkey = p_partkey. But this time, the correlation is more complex: we must execute several joins to find the value that the outer ps_supplycost is looking for.

Consider this expression:

ps_supplycost = (SELECT MIN(ps_supplycost)
                       FROM tpch.partsupp AS ps_min
                       INNER JOIN tpch.supplier AS s_min
                           ON ps_suppkey = s_suppkey
                       INNER JOIN tpch.nation AS n_min
                           ON s_nationkey = n_nationkey
                       INNER JOIN region AS r_min
                           ON n_regionkey = r_regionkey
                       WHERE ps_partkey = p_partkey
                         AND r_name = 'EUROPE')

The key here is decorrelate this query into a join to this decorrelated construct

SELECT ps_partkey, 
       MIN(ps_supplycost) AS min_ps_supplycost
FROM tpch.partsupp AS ps_min
INNER JOIN tpch.supplier AS s_min
    ON ps_suppkey = s_suppkey
INNER JOIN tpch.nation AS n_min
    ON s_nationkey = n_nationkey
INNER JOIN region AS r_min
   ON n_regionkey = r_regionkey
WHERE ps_partkey = p_partkey
    AND r_name = 'EUROPE'
GROUP BY ps_partkey;

This allows us to harvest the filter on r_name and apply it transitively to the join in the outer query.