PlannerTPC-H — TPCH-Q03

Engine Compare

Accuracy chart, rows processed ?
Scan
Scan
Seek
Seek
Join Probe
Join
Sort
Sort
Hash Build
Hash
Aggregate
Agg
Distribute
Dist
ClickHouse
Estimation Error
Est Err
Row Operations
Ops
3,999,030
4M
Rank
Estimation Error
Est Err
Row Operations
Ops
3,393,107
3.4M
Rank
Estimation Error
Est Err
Row Operations
Ops
11,281
11K
Rank
Estimation Error
Est Err
Row Operations
Ops
757,254
757K
Rank
Estimation Error
Est Err
Row Operations
Ops
29,857
30K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
Databricks
Estimation Error
Est Err
Row Operations
Ops
3,997,498
4M
Rank
Estimation Error
Est Err
Row Operations
Ops
975,174
975K
Rank
Estimation Error
Est Err
Row Operations
Ops
11,281
11K
Rank
Estimation Error
Est Err
Row Operations
Ops
870,989
871K
Rank
Estimation Error
Est Err
Row Operations
Ops
3,264,338
3.3M
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
DuckDB
Estimation Error
Est Err
Row Operations
Ops
809,889
810K
Rank
Estimation Error
Est Err
Row Operations
Ops
779,940
780K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
Estimation Error
Est Err
Row Operations
Ops
290,432
290K
Rank
Estimation Error
Est Err
Row Operations
Ops
29,857
30K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
PostgreSQL
Estimation Error
Est Err
Row Operations
Ops
902,470
902K
Rank
Estimation Error
Est Err
Row Operations
Ops
145,216
145K
Rank
Estimation Error
Est Err
Row Operations
Ops
872,521
873K
Rank
Estimation Error
Est Err
Row Operations
Ops
29,866
30K
Rank
Estimation Error
Est Err
Row Operations
Ops
96,810
97K
Rank
Estimation Error
Est Err
Row Operations
Ops
29,856
30K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
SQL Server
Estimation Error
Est Err
Row Operations
Ops
205,022
205K
Rank
Estimation Error
Est Err
Row Operations
Ops
156,497
156K
Rank
Estimation Error
Est Err
Row Operations
Ops
10
10
Rank
Estimation Error
Est Err
Row Operations
Ops
290,432
290K
Rank
Estimation Error
Est Err
Row Operations
Ops
29,857
30K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank

Actual Query Plans

Query Plan per Engine ?
Query Plan
ClickHouse
Estimate    Actual  Operator
       -        10  PROJECT l_orderkey, a1 AS revenue, o_orderdate, o_shippriority
       -        10  LIMIT 10
       -     11281  SORT a1, o_orderdate
       -     11281  PROJECT a1, o_orderdate, l_orderkey, o_shippriority
       -     11281  AGGREGATE SUM(a5) AS a1 GROUP BY HASH l_orderkey, o_orderdate, o_shippriority
       -     29857  PROJECT l_orderkey, o_orderdate, o_shippriority, l_extendedprice * (1 - l_discount) AS a5
       -     29857  PROJECT l_orderkey, l_extendedprice, l_discount, o_orderdate, o_shippriority
       -     29857  INNER JOIN HASH ON PROJECTION_75.o_custkey = PROJECTION_72.c_custkey
       -     29949  │└PROJECT c_custkey
       -     29949   PROJECT c_custkey
       -     29949   TABLE SCAN customer WHERE c_mktsegment = 'MACHINERY'
       -    151331  PROJECT o_custkey, l_orderkey, l_extendedprice, l_discount, o_orderdate, o_shippriority
       -    151331  PROJECT l_orderkey, l_extendedprice, l_discount, o_orderdate, o_shippriority, o_custkey
       -    151331  INNER JOIN HASH ON PROJECTION_81.l_orderkey = PROJECTION_78.o_orderkey
       -    727305  │└PROJECT o_orderkey, o_orderdate, o_shippriority, o_custkey
       -    727305   PROJECT o_orderkey, o_orderdate, o_shippriority, o_custkey
       -    727305   TABLE SCAN orders WHERE o_orderdate < '1995-03-15'
       -   3241776  PROJECT l_orderkey, l_extendedprice, l_discount
       -   3241776  PROJECT l_orderkey, l_extendedprice, l_discount
       -   3241776  TABLE SCAN lineitem WHERE l_shipdate > '1995-03-15'
DuckDB
Estimate    Actual  Operator
      10        10  LIMIT 10
      10         ∞  SORT SUM(l_extendedprice * (1 - l_discount)), o_orderdate
   63865     11281  PROJECT l_orderkey, revenue, o_orderdate, o_shippriority
   63865     11281  AGGREGATE SUM(#3) GROUP BY HASH #0, #1, #2
   63866     29857  PROJECT l_orderkey, o_orderdate, o_shippriority, l_extendedprice * (1.00 - l_discount)
   63866     29857  INNER JOIN HASH ON l_orderkey = o_orderkey
   79155    145216  │└INNER JOIN HASH ON o_custkey = c_custkey
   37500     29949   │└FILTER c_custkey <= 149999
   37500     29949    TABLE SCAN customer WHERE c_mktsegment = 'MACHINERY'
  300000    727292   TABLE SCAN orders WHERE o_orderdate < '1995-03-15'
 1200243     52648  TABLE SCAN lineitem WHERE l_shipdate > '1995-03-15'
Databricks
Estimate    Actual  Operator
      10        10  LIMIT 10
 1830000     11281  SORT revenue DESC NULLS LAST, o_orderdate ASC NULLS FIRST
 1830000     11281  AGGREGATE SUM(l_extendedprice * (1BD - l_discount)) GROUP BY HASH l_orderkey, o_orderdate, o_shippriority
 1830000     11281  DISTRIBUTE HASH ON l_orderkey, o_orderdate, o_shippriority
 1830000     11281  AGGREGATE SUM(l_extendedprice * (1BD - l_discount)) GROUP BY HASH l_orderkey, o_orderdate, o_shippriority
 1830000     11281  INNER JOIN HASH ON o_orderkey = l_orderkey
 1830000    145216  │└DISTRIBUTE GATHER
  451000    145216   INNER JOIN HASH ON o_custkey = c_custkey
  150000    725773   │└TABLE SCAN orders WHERE o_orderdate < DATE'1995-03-15'
  451000     29949   DISTRIBUTE GATHER
 6000000     29949   TABLE SCAN customer WHERE c_mktsegment = 'MACHINERY'collate UTF8_BINARY
 3230000    829958  AGGREGATE SUM(l_extendedprice * (1BD - l_discount)) GROUP BY HASH l_orderkey
 1500000   3241776  TABLE SCAN lineitem WHERE l_shipdate > DATE'1995-03-15'
PostgreSQL
Estimate    Actual  Operator
      10        10  LIMIT 10
  318752        10  SORT SUM(l_extendedprice * ('1' - l_discount)), o_orderdate
  318752     11281  AGGREGATE SUM(l_extendedprice * ('1' - l_discount)) GROUP BY SORT l_orderkey, o_orderdate, o_shippriority
  398439     29856  SORT l_orderkey, o_orderdate, o_shippriority
  398439     29856  INNER JOIN LOOP ON l_orderkey = o_orderkey
   61994     48405  │└INNER JOIN HASH ON o_custkey = c_custkey
   38268     29949   │└TABLE SCAN customer AS customer WHERE c_mktsegment = 'MACHINERY'
  911229    727305   TABLE SCAN orders AS orders WHERE o_orderdate < '1995-03-15'
 1016512    145216  TABLE SEEK lineitem AS lineitem WHERE l_shipdate > '1995-03-15'
SQL Server
Estimate    Actual  Operator
      10        10  LIMIT 0
      10        10  SORT Expr1009, o_orderdate
  222132     11281  INNER JOIN HASH ON l_orderkey = o_orderkey
  228150    145216  │└INNER JOIN HASH ON o_custkey = c_custkey
   29868     29949   │└TABLE SCAN customer WHERE c_mktsegment = 'MACHINERY'
   72698    145216   TABLE SCAN orders WHERE o_orderdate < '1995-03-15' AND BLOOM(o_custkey)
  124448     11281  AGGREGATE SUM(Expr1010) AS Expr1009 GROUP BY HASH l_orderkey
  323565     29857  PROJECT l_extendedprice * (1. - l_discount) AS Expr1010
  323565     29857  TABLE SCAN lineitem WHERE l_shipdate > '1995-03-15' AND BLOOM(l_orderkey)

Commentary

Optimal Join Order

Optimisers only seeking left deep trees will consume signficant memory while executing the query.

The key optimisation in join ordering is to create a bushy join on customer and orders before joining to lineitem. This allows both l_shipdate > '1995-03-15' and o_orderdate < '1995-03-15' to be be pre-filtered before the join to lineitem greatly reducing the amount of data to be processed.

Optimisers who understand bloom filters will be able to harvest further filters by combining the customer and orders filters before accessing lineitem.

Heap Sort

The LIMIT 10 in the query is best optimised with heap sort nodes. This is typically done with a special sort operator that can cut the stream early. Good optimisers and execution engines find this optimisation and reduce the amount of data to be sorted before emitting the top 10.

In the dbprove tool, this shows up a the sort not only emitting a few rows.