PlannerTPC-H — TPCH-Q09

Engine Compare

Accuracy chart, rows processed ?
Scan
Scan
Seek
Seek
Join Probe
Join
Sort
Sort
Hash Build
Hash
Aggregate
Agg
Distribute
Dist
ClickHouse
Estimation Error
Est Err
Row Operations
Ops
8,322,147
8.3M
Rank
Estimation Error
Est Err
Row Operations
Ops
8,638,363
8.6M
Rank
Estimation Error
Est Err
Row Operations
Ops
175
175
Rank
Estimation Error
Est Err
Row Operations
Ops
12,666,726
13M
Rank
Estimation Error
Est Err
Row Operations
Ops
327,148
327K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
Databricks
Estimation Error
Est Err
Row Operations
Ops
8,314,398
8.3M
Rank
Estimation Error
Est Err
Row Operations
Ops
7,457,762
7.5M
Rank
Estimation Error
Est Err
Row Operations
Ops
175
175
Rank
Estimation Error
Est Err
Row Operations
Ops
12,651,228
13M
Rank
Estimation Error
Est Err
Row Operations
Ops
327,498
327K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
DuckDB
Estimation Error
Est Err
Row Operations
Ops
8,322,111
8.3M
Rank
Estimation Error
Est Err
Row Operations
Ops
7,465,475
7.5M
Rank
Estimation Error
Est Err
Row Operations
Ops
175
175
Rank
Estimation Error
Est Err
Row Operations
Ops
3,437,148
3.4M
Rank
Estimation Error
Est Err
Row Operations
Ops
327,148
327K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
PostgreSQL
Estimation Error
Est Err
Row Operations
Ops
1,891,770
1.9M
Rank
Estimation Error
Est Err
Row Operations
Ops
370,838
371K
Rank
Estimation Error
Est Err
Row Operations
Ops
1,929,008
1.9M
Rank
Estimation Error
Est Err
Row Operations
Ops
327,147
327K
Rank
Estimation Error
Est Err
Row Operations
Ops
727,009
727K
Rank
Estimation Error
Est Err
Row Operations
Ops
327,672
328K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
SQL Server
Estimation Error
Est Err
Row Operations
Ops
685,198
685K
Rank
Estimation Error
Est Err
Row Operations
Ops
751,552
752K
Rank
Estimation Error
Est Err
Row Operations
Ops
175
175
Rank
Estimation Error
Est Err
Row Operations
Ops
751,552
752K
Rank
Estimation Error
Est Err
Row Operations
Ops
327,148
327K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank

Actual Query Plans

Query Plan per Engine ?
Query Plan
ClickHouse
Estimate    Actual  Operator
       -       175  PROJECT nation, o_year, a1 AS sum_profit
       -       175  SORT nation, o_year
       -       175  PROJECT nation, o_year, a1
       -       175  AGGREGATE SUM(amount) AS a1 GROUP BY HASH nation, o_year
       -    327148  PROJECT n_name AS nation, EXTRACT(YEAR FROM o_orderdate) AS o_year, (l_extendedprice * (1 - l_discount)) - (ps_supplycost * l_quantity) AS amount
       -    327148  PROJECT l_extendedprice, l_discount, l_quantity, o_orderdate, ps_supplycost, n_name
       -    327148  INNER JOIN HASH ON PROJECTION_259.o_orderkey = PROJECTION_232.l_orderkey
       -    327148  │└PROJECT l_orderkey, l_extendedprice, l_discount, l_quantity, ps_supplycost, n_name
       -    327148   PROJECT l_extendedprice, l_discount, l_quantity, l_orderkey, ps_supplycost, n_name
       -    327148   INNER JOIN HASH ON tuple(PROJECTION_256.ps_suppkey,PROJECTION_256.ps_partkey) = tuple(PROJECTION_235.l_suppkey,PROJECTION_235.l_partkey)
       -    327148   │└PROJECT l_suppkey, l_partkey, l_extendedprice, l_discount, l_quantity, l_orderkey, n_name
       -    327148    PROJECT l_extendedprice, l_discount, l_quantity, l_orderkey, l_suppkey, l_partkey, n_name
       -    327148    INNER JOIN HASH ON PROJECTION_253.p_partkey = PROJECTION_238.l_partkey
       -   6001215    │└PROJECT l_partkey, l_extendedprice, l_discount, l_quantity, l_orderkey, l_suppkey, n_name
       -   6001215     PROJECT l_extendedprice, l_discount, l_quantity, l_orderkey, l_suppkey, l_partkey, n_name
       -   6001215     INNER JOIN HASH ON PROJECTION_250.l_suppkey = PROJECTION_241.s_suppkey
       -     10000     │└PROJECT s_suppkey, n_name
       -     10000      PROJECT s_suppkey, n_name
       -     10000      INNER JOIN HASH ON PROJECTION_247.s_nationkey = PROJECTION_244.n_nationkey
       -        25      │└PROJECT n_nationkey, n_name
       -        25       PROJECT n_name, n_nationkey
       -        25       TABLE SCAN nation
       -     10000      PROJECT s_nationkey, s_suppkey
       -     10000      PROJECT s_suppkey, s_nationkey
       -     10000      TABLE SCAN supplier
       -   6001215     PROJECT l_suppkey, l_extendedprice, l_discount, l_quantity, l_orderkey, l_partkey
       -   6001215     PROJECT l_extendedprice, l_discount, l_quantity, l_orderkey, l_suppkey, l_partkey
       -   6001215     TABLE SCAN lineitem
       -     10907    PROJECT p_partkey
       -     10907    PROJECT p_partkey
       -     10907    TABLE SCAN part WHERE p_name LIKE '%lace%'
       -    800000   PROJECT ps_suppkey, ps_partkey, ps_supplycost
       -    800000   PROJECT ps_supplycost, ps_suppkey, ps_partkey
       -    800000   TABLE SCAN partsupp
       -   1500000  PROJECT o_orderkey, o_orderdate
       -   1500000  PROJECT o_orderdate, o_orderkey
       -   1500000  TABLE SCAN orders
DuckDB
Estimate    Actual  Operator
       -       175  SORT profit.nation, profit.o_year
 4934591       175  AGGREGATE SUM(#2) GROUP BY HASH #0, #1
 5118003    327148  PROJECT nation, o_year, amount
 5118003    327148  PROJECT nation, o_year, amount
 5118003    327148  INNER JOIN HASH ON l_suppkey = s_suppkey AND p_partkey = ps_partkey
  769230    800000  │└INNER JOIN HASH ON ps_suppkey = s_suppkey
    9615     10000   │└INNER JOIN HASH ON s_nationkey = n_nationkey
      25        25    │└TABLE SCAN nation
   10000     10000    TABLE SCAN supplier
  800000    800000   TABLE SCAN partsupp
 1269043    327148  INNER JOIN HASH ON l_orderkey = o_orderkey
 1500000   1500000  │└TABLE SCAN orders
 1258538    327148  INNER JOIN HASH ON l_partkey = p_partkey
   40000     10907  │└TABLE SCAN part WHERE contains(p_name,'lace')
 6001215   6001179  TABLE SCAN lineitem
Databricks
Estimate    Actual  Operator
 6000000       175  SORT profit.nation ASC NULLS FIRST, profit.o_year DESC NULLS LAST
 6000000       175  AGGREGATE SUM(profit.amount) GROUP BY HASH profit.nation, profit.o_year
 6000000       350  DISTRIBUTE HASH ON profit.nation, profit.o_year
 6000000       350  AGGREGATE SUM(profit.amount) GROUP BY HASH profit.nation, profit.o_year
 6000000    327148  INNER JOIN HASH ON l_suppkey = s_suppkey
 6000000    327148  │└INNER JOIN HASH ON l_suppkey = ps_suppkey
 6000000    327148   │└INNER JOIN HASH ON l_partkey = p_partkey
   10000   5993466    │└INNER JOIN HASH ON l_orderkey = o_orderkey
  800000   5993466     │└TABLE SCAN lineitem
   10000   1500000     DISTRIBUTE GATHER
      25   1500000     TABLE SCAN orders
 6000000     10907    DISTRIBUTE GATHER
   10000     10907    TABLE SCAN part WHERE contains(p_name,'lace'collate UTF8_BINARY)
 6000000    800000   DISTRIBUTE GATHER
 6000000    800000   TABLE SCAN partsupp
 6000000     10000  DISTRIBUTE GATHER
 6000000     10000  INNER JOIN HASH ON s_nationkey = n_nationkey
 1500000     10000  │└TABLE SCAN supplier
 6000000        25  DISTRIBUTE GATHER
  200000        25  TABLE SCAN nation
PostgreSQL
Estimate    Actual  Operator
   60150       175  AGGREGATE SUM((l_extendedprice * ('1' - l_discount)) - (ps_supplycost * l_quantity)) GROUP BY SORT n_name, EXTRACT(year FROM o_orderdate)
  180450       525  AGGREGATE PARTIAL SUM((l_extendedprice * ('1' - l_discount)) - (ps_supplycost * l_quantity)) GROUP BY SORT n_name, EXTRACT(year FROM o_orderdate)
  454629    327147  SORT n_name, EXTRACT(year FROM o_orderdate)
  151543    109049  INNER JOIN HASH ON o_orderkey = l_orderkey
  454629    327147  │└INNER JOIN LOOP ON (l_partkey = ps_partkey) AND (l_suppkey = ps_suppkey)
   20202     14542   │└INNER JOIN HASH ON s_nationkey = n_nationkey
      25        25    │└TABLE SCAN nation AS nation
   20202     14542    INNER JOIN HASH ON ps_suppkey = s_suppkey
   10000     10000    │└TABLE SCAN supplier AS supplier
   60606     43628    INNER JOIN LOOP ON ps_partkey = p_partkey
   15150     10907    │└TABLE SCAN part AS part WHERE p_name LIKE '%lace%'
   43628     43628    TABLE SEEK partsupp AS partsupp
   43628    327210   TABLE SEEK lineitem AS lineitem
 1875000   1500000  TABLE SCAN orders AS orders
SQL Server
Estimate    Actual  Operator
     175       175  SORT n_name, Expr1018
     175       175  AGGREGATE SUM(Expr1029) AS Expr1020 GROUP BY HASH n_name, Expr1018
   41992    327148  PROJECT Expr1021 - ps_supplycost * l_quantity AS Expr1029
   41992    327148  INNER JOIN HASH ON o_orderkey = l_orderkey
   42438    327148  │└INNER JOIN HASH ON l_suppkey = s_suppkey AND l_partkey = p_partkey
   42377     43628   │└INNER JOIN HASH ON ps_partkey = p_partkey
   10481     10907    │└TABLE SCAN part WHERE p_name LIKE '%lace%'
   80000     43628    INNER JOIN HASH ON ps_suppkey = s_suppkey
   10000     10000    │└INNER JOIN HASH ON s_nationkey = n_nationkey
      25        25     │└TABLE SCAN nation
   10000     10000     TABLE SCAN supplier
   80000     43628    TABLE SCAN partsupp WHERE BLOOM(ps_partkey)
   60012    327148   PROJECT l_extendedprice * (1. - l_discount) AS Expr1021
   60012    327148   TABLE SCAN lineitem WHERE BLOOM(l_suppkey) AND BLOOM(l_partkey)
   15000    293490  PROJECT datepart(year,o_orderdate) AS Expr1018
   15000    293490  TABLE SCAN orders WHERE BLOOM(o_orderkey)

Commentary

Composite Key Joins

Estimating join cardinality on multiple columns (like l_partkey and l_suppkey) is particularly challenging for query optimizers. Without composite statistics (which are often not automatically collected), optimizers may get these estimates wildly wrong, leading to sub-optimal plans.

Aggregate before Join

A powerful optimization for Query 9 is to aggregate the data before joining to the nation table. By grouping by the nation key early, the number of join operations to resolve the nation name can be reduced by several orders of magnitude.

Transitive Closure

Optimizers can use transitive closure of the keys to infer additional filters. For example, a filter on part.p_name also implies a filter on partsupp and lineitem, allowing these tables to be reduced before joining.