PlannerTPC-H — TPCH-Q02

Engine Compare

Accuracy chart, rows processed ?
Scan
Scan
Seek
Seek
Join Probe
Join
Sort
Sort
Hash Build
Hash
Aggregate
Agg
Distribute
Dist
ClickHouse
Estimation Error
Est Err
Row Operations
Ops
1,620,855
1.6M
Rank
Estimation Error
Est Err
Row Operations
Ops
1,090,392
1.1M
Rank
Estimation Error
Est Err
Row Operations
Ops
495
495
Rank
Estimation Error
Est Err
Row Operations
Ops
2,404,787
2.4M
Rank
Estimation Error
Est Err
Row Operations
Ops
158,960
159K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
Databricks
Estimation Error
Est Err
Row Operations
Ops
1,606,213
1.6M
Rank
Estimation Error
Est Err
Row Operations
Ops
1,724,196
1.7M
Rank
Estimation Error
Est Err
Row Operations
Ops
495
495
Rank
Estimation Error
Est Err
Row Operations
Ops
164,116
164K
Rank
Estimation Error
Est Err
Row Operations
Ops
276,382
276K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
DuckDB
Estimation Error
Est Err
Row Operations
Ops
1,620,960
1.6M
Rank
Estimation Error
Est Err
Row Operations
Ops
1,626,036
1.6M
Rank
Estimation Error
Est Err
Row Operations
Ops
495
495
Rank
Estimation Error
Est Err
Row Operations
Ops
14,334
14K
Rank
Estimation Error
Est Err
Row Operations
Ops
689
689
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
PostgreSQL
Estimation Error
Est Err
Row Operations
Ops
177,353
177K
Rank
Estimation Error
Est Err
Row Operations
Ops
176,523
177K
Rank
Estimation Error
Est Err
Row Operations
Ops
177,351
177K
Rank
Estimation Error
Est Err
Row Operations
Ops
495
495
Rank
Estimation Error
Est Err
Row Operations
Ops
331,671
332K
Rank
Estimation Error
Est Err
Row Operations
Ops
1,375
1.4K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
SQL Server
Estimation Error
Est Err
Row Operations
Ops
4,427
4.4K
Rank
Estimation Error
Est Err
Row Operations
Ops
5,156
5.2K
Rank
Estimation Error
Est Err
Row Operations
Ops
495
495
Rank
Estimation Error
Est Err
Row Operations
Ops
6,762
6.8K
Rank
Estimation Error
Est Err
Row Operations
Ops
689
689
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank

Actual Query Plans

Query Plan per Engine ?
Query Plan
ClickHouse
Estimate    Actual  Operator
       -       495  PROJECT s_acctbal, s_name, n_name, p_partkey_right AS p_partkey, p_mfgr, s_address, s_phone, s_comment
       -       495  SORT s_acctbal, n_name, s_name, p_partkey_right
       -       495  PROJECT s_acctbal, n_name, s_name, p_partkey_right, p_mfgr, s_address, s_phone, s_comment
       -       495  FILTER ps_supplycost = a6
       -       689  PROJECT a6, ps_supplycost, p_partkey_right, p_mfgr, s_acctbal, s_name, s_address, s_phone, s_comment, n_name
       -       689  INNER JOIN HASH ON PROJECTION_28.ps_suppkey = PROJECTION_13.s_suppkey
       -      1987  │└PROJECT s_suppkey, s_acctbal, s_name, s_address, s_phone, s_comment, n_name
       -      1987   PROJECT s_acctbal, s_name, s_address, s_phone, s_comment, s_suppkey, n_name
       -      1987   INNER JOIN HASH ON PROJECTION_25.s_nationkey = PROJECTION_16.n_nationkey
       -         5   │└PROJECT n_nationkey, n_name
       -         5    PROJECT n_name, n_nationkey
       -         5    INNER JOIN HASH ON PROJECTION_22.n_regionkey = PROJECTION_19.r_regionkey
       -         1    │└PROJECT r_regionkey
       -         1     PROJECT r_regionkey
       -         1     TABLE SCAN region WHERE r_name = 'EUROPE'
       -        25    PROJECT n_regionkey, n_name, n_nationkey
       -        25    PROJECT n_name, n_nationkey, n_regionkey
       -        25    TABLE SCAN nation
       -     10000   PROJECT s_nationkey, s_acctbal, s_name, s_address, s_phone, s_comment, s_suppkey
       -     10000   PROJECT s_acctbal, s_name, s_address, s_phone, s_comment, s_suppkey, s_nationkey
       -     10000   TABLE SCAN supplier
       -      1980  PROJECT ps_suppkey, a6, ps_supplycost, p_partkey, p_mfgr
       -      1980  PROJECT ps_suppkey, ps_supplycost, p_partkey, p_mfgr, a6
       -      1980  INNER JOIN HASH ON PROJECTION_34.p_partkey = PROJECTION_31.ps_partkey
       -    800000  │└PROJECT ps_partkey, ps_suppkey, ps_supplycost
       -    800000   PROJECT ps_partkey, ps_suppkey, ps_supplycost
       -    800000   TABLE SCAN partsupp
       -       495  PROJECT p_partkey, p_mfgr, a6
       -       495  PROJECT p_partkey, p_mfgr, a6
       -       495  INNER JOIN HASH ON PROJECTION_40.ps_partkey = PROJECTION_37.p_partkey
       -       803  │└PROJECT p_partkey AS p_partkey_right, p_mfgr
       -       803   PROJECT p_partkey, p_mfgr
       -       803   TABLE SCAN part WHERE (p_size = 25) AND endsWith(p_type,'BRASS')
       -    117422  PROJECT p_partkey, a6
       -    117422  PROJECT a7 AS a6, ps_partkey AS p_partkey
       -    117422  AGGREGATE MIN(ps_supplycost) AS a7 GROUP BY HASH ps_partkey
       -    158960  PROJECT ps_supplycost, ps_partkey, r_name, ps_partkey
       -    158960  PROJECT ps_supplycost, ps_partkey, r_name
       -    158960  INNER JOIN HASH ON PROJECTION_55.n_nationkey = PROJECTION_46.s_nationkey
       -    800000  │└PROJECT s_nationkey, ps_supplycost, ps_partkey
       -    800000   PROJECT ps_supplycost, ps_partkey, s_nationkey
       -    800000   INNER JOIN HASH ON PROJECTION_52.ps_suppkey = PROJECTION_49.s_suppkey
       -     10000   │└PROJECT s_suppkey, s_nationkey
       -     10000    PROJECT s_suppkey, s_nationkey
       -     10000    TABLE SCAN supplier
       -    800000   PROJECT ps_suppkey, ps_supplycost, ps_partkey
       -    800000   PROJECT ps_suppkey, ps_partkey, ps_supplycost
       -    800000   TABLE SCAN partsupp WHERE ps_partkey = ps_partkey
       -         5  PROJECT n_nationkey, r_name
       -         5  PROJECT n_nationkey, r_name
       -         5  INNER JOIN HASH ON PROJECTION_61.n_regionkey = PROJECTION_58.r_regionkey
       -         1  │└PROJECT r_regionkey, r_name
       -         1   PROJECT r_regionkey, r_name
       -         1   TABLE SCAN region WHERE r_name = 'EUROPE'
       -        25  PROJECT n_regionkey, n_nationkey
       -        25  PROJECT n_nationkey, n_regionkey
       -        25  TABLE SCAN nation
DuckDB
Estimate    Actual  Operator
       -       495  SORT s_acctbal, n_name, s_name, p_partkey
    3589       495  PROJECT s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment
    3589       495  FILTER ps_supplycost = SUBQUERY
       0         0  LEFT SEMI JOIN HASH ON p_partkey IS NOT DISTINCT FROM p_partkey
    3479       495  │└AGGREGATE  GROUP BY HASH #3
       0       689   LEFT OUTER JOIN HASH ON p_partkey IS NOT DISTINCT FROM p_partkey
    4628       495   │└PROJECT min_cost, p_partkey
    4628       495    AGGREGATE MIN(#1) GROUP BY HASH #0
    9256       689    PROJECT p_partkey, ps_supplycost
    9256       689    INNER JOIN HASH ON ps_suppkey = s_suppkey
    1923      1987    │└INNER JOIN HASH ON s_nationkey = n_nationkey
       5         5     │└INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1      │└TABLE SCAN region WHERE r_name = 'EUROPE'
      25        25      TABLE SCAN nation
   10000     10000     TABLE SCAN supplier
   48134      1980    INNER JOIN HASH ON ps_partkey = p_partkey
    3479         0    │└SCAN MATERIALISED
  800000    799708    TABLE SCAN partsupp
       0       689   SCAN MATERIALISED
    3589       689  INNER JOIN HASH ON ps_suppkey = s_suppkey
    1923      1987  │└INNER JOIN HASH ON s_nationkey = n_nationkey
       5         5   │└INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1    │└TABLE SCAN region WHERE r_name = 'EUROPE'
      25        25    TABLE SCAN nation
   10000     10000   TABLE SCAN supplier
   18667      3212  INNER JOIN HASH ON ps_partkey = p_partkey
    4445       803  │└TABLE SCAN part WHERE p_size = 25 AND suffix(p_type,'BRASS')
  800000    799708  TABLE SCAN partsupp
Databricks
Estimate    Actual  Operator
   16000       495  SORT s_acctbal DESC NULLS LAST, n_name ASC NULLS FIRST, s_name ASC NULLS FIRST, p_partkey ASC NULLS FIRST
   16000       495  DISTRIBUTE HASH ON s_acctbal DESC NULLS LAST, n_name ASC NULLS FIRST, s_name ASC NULLS FIRST, p_partkey ASC NULLS FIRST
   16000       495  INNER JOIN HASH ON ps_suppkey = s_suppkey
   16000       495  │└DISTRIBUTE GATHER
    3850       495   INNER JOIN HASH ON p_partkey = ps_partkey
    3850       495   │└DISTRIBUTE GATHER
    2000       495    INNER JOIN HASH ON p_partkey = ps_min.ps_partkey
    2000       803    │└DISTRIBUTE GATHER
  800000       803     TABLE SCAN part WHERE (p_size = 25L) AND endswith(p_type,'BRASS'collate UTF8_BINARY)
    3850    117422    FILTER min_cost IS NOT NULL
   40300    117422    AGGREGATE MIN(ps_min.ps_supplycost) GROUP BY HASH ps_min.ps_partkey
   40300    117422    DISTRIBUTE HASH ON ps_min.ps_partkey
       5    117422    AGGREGATE MIN(ps_min.ps_supplycost) GROUP BY HASH ps_min.ps_partkey
       5    158960    INNER JOIN HASH ON s_min.s_suppkey = ps_min.ps_suppkey
       5      1987    │└DISTRIBUTE GATHER
    2000      1987     INNER JOIN HASH ON n_min.n_nationkey = s_min.s_nationkey
    2000         5     │└DISTRIBUTE GATHER
       5         5      INNER JOIN HASH ON n_min.n_regionkey = r_min.r_regionkey
       5         5      │└TABLE SCAN nation
       5         1      DISTRIBUTE GATHER
   10000         1      TABLE SCAN region WHERE r_min.r_name = 'EUROPE'collate UTF8_BINARY
      25      6704     TABLE SCAN supplier
  800000    796702    TABLE SCAN partsupp
      25    795909   TABLE SCAN partsupp
   16000      1366  INNER JOIN HASH ON n_nationkey = s_nationkey
   16000         5  │└DISTRIBUTE GATHER
  167000         5   INNER JOIN HASH ON n_regionkey = r_regionkey
       5         5   │└TABLE SCAN nation
   10000         1   TABLE SCAN region WHERE r_min.r_name = 'EUROPE'collate UTF8_BINARY
  200000      6083  TABLE SCAN supplier
PostgreSQL
Estimate    Actual  Operator
       1       495  SEQUENCE
    1298      1298  ├─AGGREGATE MIN(ps_supplycost)
    1298      1375   INNER JOIN LOOP ON ps_partkey = p_partkey AND n_regionkey = r_regionkey AND (n_regionkey = r_regionkey)
       1         1   │└TABLE SCAN region AS r_min WHERE r_min.r_name = 'EUROPE'
    5192      5192   INNER JOIN LOOP ON n_nationkey = s_nationkey
    5192      5192   │└INNER JOIN LOOP ON s_suppkey = ps_suppkey
    5192      5192    │└TABLE SEEK partsupp AS ps_min
    5192      5192    TABLE SEEK supplier AS s_min
    5192      5192   TABLE SEEK nation AS n_min
       1       495  └─SORT s_acctbal, n_name, s_name, p_partkey
       1       495    INNER JOIN HASH ON (p_partkey = ps_partkey) AND ((SubPlan 1) = ps_supplycost)
  160000    158960    │└INNER JOIN LOOP ON ps_suppkey = s_suppkey
    2000      1987     │└INNER JOIN LOOP ON s_nationkey = n_nationkey
       5         5      │└INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1       │└TABLE SCAN region AS region WHERE r_name = 'EUROPE'
      25        25       TABLE SCAN nation AS nation
    2000      1987      TABLE SEEK supplier AS supplier
  158960    158960     TABLE SEEK partsupp AS partsupp
     945       803    TABLE SCAN part AS part WHERE (p_type LIKE '%BRASS') AND (p_size = 25)
SQL Server
Estimate    Actual  Operator
     210       495  SORT s_acctbal, n_name, s_name, ps_partkey
     210       495  INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1  │└TABLE SCAN region WHERE r_name = 'EUROPE'
     105       495  INNER JOIN HASH ON s_nationkey = n_nationkey
       2         5  │└TABLE SCAN nation WHERE BLOOM(n_regionkey)
     105       495  INNER JOIN HASH ON s_suppkey = ps_suppkey
    1052       495  │└INNER JOIN HASH ON Expr1023 = ps_supplycost AND ps_partkey = ps_partkey as ps_partkey
    1052       495   │└INNER JOIN HASH ON ps_min.ps_partkey = p_partkey
    1771       803    │└TABLE SCAN part WHERE p_type LIKE '%BRASS' AND p_size = 25
    1175       495    AGGREGATE MIN(ps_supplycost as ps_supplycost) AS Expr1023 GROUP BY HASH ps_partkey
    1600       689    INNER JOIN HASH ON ps_min.ps_suppkey = s_min.s_suppkey
    2000      1987    │└INNER JOIN HASH ON s_min.s_nationkey = n_min.n_nationkey
       5         5     │└INNER JOIN HASH ON n_min.n_regionkey = r_min.r_regionkey
       1         1      │└TABLE SCAN region AS r_min WHERE r_name as r_name = 'EUROPE'
       2         5      TABLE SCAN nation AS n_min WHERE BLOOM(n_regionkey as n_regionkey)
    1000      1987     TABLE SCAN supplier AS s_min WHERE BLOOM(s_nationkey as s_nationkey)
     800       689    TABLE SCAN partsupp AS ps_min WHERE BLOOM(ps_suppkey as ps_suppkey) AND BLOOM(ps_partkey as ps_partkey)
     800       495   TABLE SCAN partsupp WHERE (BLOOM(ps_partkey)) AND (BLOOM(ps_supplycost,ps_partkey))
     100       441  TABLE SCAN supplier WHERE BLOOM(s_suppkey) AND BLOOM(s_nationkey)

Commentary

The correlated Subquery

The fascinating bit about this query is the nested subquery: SELECT MIN(ps_supplycost) FROM tpch.partsupp....

That subquery is again correlated with the outer query via this filter: ps_partkey = p_partkey. But this time, the correlation is more complex: we must execute several joins to find the value that the outer ps_supplycost is looking for.

Consider this expression:

ps_supplycost = (SELECT MIN(ps_supplycost)
                       FROM tpch.partsupp AS ps_min
                       INNER JOIN tpch.supplier AS s_min
                           ON ps_suppkey = s_suppkey
                       INNER JOIN tpch.nation AS n_min
                           ON s_nationkey = n_nationkey
                       INNER JOIN region AS r_min
                           ON n_regionkey = r_regionkey
                       WHERE ps_partkey = p_partkey
                         AND r_name = 'EUROPE')

The key here is decorrelate this query into a join to this decorrelated construct

SELECT ps_partkey, 
       MIN(ps_supplycost) AS min_ps_supplycost
FROM tpch.partsupp AS ps_min
INNER JOIN tpch.supplier AS s_min
    ON ps_suppkey = s_suppkey
INNER JOIN tpch.nation AS n_min
    ON s_nationkey = n_nationkey
INNER JOIN region AS r_min
   ON n_regionkey = r_regionkey
WHERE ps_partkey = p_partkey
    AND r_name = 'EUROPE'
GROUP BY ps_partkey;

This allows us to harvest the filter on r_name and apply it transitively to the join in the outer query.