PlannerTPC-H — TPCH-Q10

Engine Compare

Accuracy chart, rows processed ?
Scan
Scan
Seek
Seek
Join Probe
Join
Sort
Sort
Hash Build
Hash
Aggregate
Agg
Distribute
Dist
ClickHouse
Estimation Error
Est Err
Row Operations
Ops
1,686,369
1.7M
Rank
Estimation Error
Est Err
Row Operations
Ops
1,708,970
1.7M
Rank
Estimation Error
Est Err
Row Operations
Ops
38,182
38K
Rank
Estimation Error
Est Err
Row Operations
Ops
380,100
380K
Rank
Estimation Error
Est Err
Row Operations
Ops
115,050
115K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
Databricks
Estimation Error
Est Err
Row Operations
Ops
1,681,513
1.7M
Rank
Estimation Error
Est Err
Row Operations
Ops
1,681,488
1.7M
Rank
Estimation Error
Est Err
Row Operations
Ops
26,475
26K
Rank
Estimation Error
Est Err
Row Operations
Ops
229,998
230K
Rank
Estimation Error
Est Err
Row Operations
Ops
157,817
158K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
DuckDB
Estimation Error
Est Err
Row Operations
Ops
220,628
221K
Rank
Estimation Error
Est Err
Row Operations
Ops
220,603
221K
Rank
Estimation Error
Est Err
Row Operations
Ops
38,182
38K
Rank
Estimation Error
Est Err
Row Operations
Ops
215,372
215K
Rank
Estimation Error
Est Err
Row Operations
Ops
115,050
115K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
PostgreSQL
Estimation Error
Est Err
Row Operations
Ops
322,447
322K
Rank
Estimation Error
Est Err
Row Operations
Ops
114,948
115K
Rank
Estimation Error
Est Err
Row Operations
Ops
284,208
284K
Rank
Estimation Error
Est Err
Row Operations
Ops
153,232
153K
Rank
Estimation Error
Est Err
Row Operations
Ops
191,682
192K
Rank
Estimation Error
Est Err
Row Operations
Ops
115,050
115K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
SQL Server
Estimation Error
Est Err
Row Operations
Ops
210,731
211K
Rank
Estimation Error
Est Err
Row Operations
Ops
345,150
345K
Rank
Estimation Error
Est Err
Row Operations
Ops
38,182
38K
Rank
Estimation Error
Est Err
Row Operations
Ops
345,150
345K
Rank
Estimation Error
Est Err
Row Operations
Ops
115,050
115K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank

Actual Query Plans

Query Plan per Engine ?
Query Plan
ClickHouse
Estimate    Actual  Operator
       -     38182  PROJECT c_custkey, c_name, a1 AS revenue, c_acctbal, n_name, c_address, c_phone, c_comment
       -     38182  SORT a1
       -     38182  PROJECT a1, c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment
       -     38182  AGGREGATE SUM(a5) AS a1 GROUP BY HASH c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment
       -    115050  PROJECT c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment, l_extendedprice * (1 - l_discount) AS a5
       -    115050  PROJECT l_extendedprice, l_discount, c_custkey, c_name, c_acctbal, c_address, c_phone, c_comment, n_name
       -    115050  INNER JOIN HASH ON PROJECTION_272.c_nationkey = PROJECTION_269.n_nationkey
       -        25  │└PROJECT n_nationkey, n_name
       -        25   PROJECT n_name, n_nationkey
       -        25   TABLE SCAN nation
       -    115050  PROJECT c_nationkey, l_extendedprice, l_discount, c_custkey, c_name, c_acctbal, c_address, c_phone, c_comment
       -    115050  PROJECT l_extendedprice, l_discount, c_custkey, c_name, c_acctbal, c_address, c_phone, c_comment, c_nationkey
       -    115050  INNER JOIN HASH ON PROJECTION_278.o_custkey = PROJECTION_275.c_custkey
       -    150000  │└PROJECT c_custkey, c_name, c_acctbal, c_address, c_phone, c_comment, c_nationkey
       -    150000   PROJECT c_custkey, c_name, c_acctbal, c_address, c_phone, c_comment, c_nationkey
       -    150000   TABLE SCAN customer
       -    115050  PROJECT o_custkey, l_extendedprice, l_discount
       -    115050  PROJECT l_extendedprice, l_discount, o_custkey
       -    115050  INNER JOIN HASH ON PROJECTION_284.l_orderkey = PROJECTION_281.o_orderkey
       -     57474  │└PROJECT o_orderkey, o_custkey
       -     57474   PROJECT o_orderkey, o_custkey
       -     57474   TABLE SCAN orders WHERE (o_orderdate >= '1994-06-01') AND (o_orderdate < '1994-09-01')
       -   1478870  PROJECT l_orderkey, l_extendedprice, l_discount
       -   1478870  PROJECT l_orderkey, l_extendedprice, l_discount
       -   1478870  TABLE SCAN lineitem WHERE l_returnflag = 'R'
DuckDB
Estimate    Actual  Operator
       -     38182  SORT SUM(l_extendedprice * (1 - l_discount))
  409397     38182  PROJECT c_custkey, c_name, revenue, c_acctbal, n_name, c_address, c_phone, c_comment
  409397     38182  AGGREGATE SUM(#7) GROUP BY HASH #0, #1, #2, #3, #4, #5, #6
  409397    115050  PROJECT c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment, l_extendedprice * (1.00 - l_discount)
  409397    115050  INNER JOIN HASH ON l_orderkey = o_orderkey
  304444     57474  │└INNER JOIN HASH ON c_custkey = o_custkey
  300000     57474   │└TABLE SCAN orders WHERE o_orderdate >= '1994-06-01' AND o_orderdate < '1994-09-01'
  144230     42848   INNER JOIN HASH ON c_nationkey = n_nationkey
      25        25   │└TABLE SCAN nation
  150000     42848   TABLE SCAN customer WHERE c_custkey <= 149999
 2000405    120281  TABLE SCAN lineitem WHERE l_returnflag = 'R'
Databricks
Estimate    Actual  Operator
  237000     26475  SORT revenue DESC NULLS LAST
  237000     26475  DISTRIBUTE HASH ON revenue DESC NULLS LAST
  237000     38182  AGGREGATE SUM(l_extendedprice * (1BD - l_discount)) GROUP BY HASH c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment
  237000     42767  DISTRIBUTE HASH ON c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment
  237000     42767  AGGREGATE SUM(l_extendedprice * (1BD - l_discount)) GROUP BY HASH c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment
  237000    115050  INNER JOIN HASH ON o_orderkey = l_orderkey
  237000     57474  │└DISTRIBUTE GATHER
   58500     57474   INNER JOIN HASH ON c_nationkey = n_nationkey
   58500     57474   │└INNER JOIN HASH ON o_custkey = c_custkey
   58500     57474    │└DISTRIBUTE GATHER
 6000000     57474     TABLE SCAN orders WHERE (o_orderdate >= DATE'1994-06-01') AND (o_orderdate < DATE'1994-09-01')
      25    147059    TABLE SCAN customer
   58500        25   DISTRIBUTE GATHER
  150000        25   TABLE SCAN nation
 1500000   1476955  TABLE SCAN lineitem WHERE l_returnflag = 'R'collate UTF8_BINARY
PostgreSQL
Estimate    Actual  Operator
   55777     38182  SORT SUM(l_extendedprice * ('1' - l_discount))
   55777     38182  AGGREGATE c_name, SUM(l_extendedprice * ('1' - l_discount)), c_acctbal, c_address, c_phone, c_comment GROUP BY SORT c_custkey, n_name
   69720    115050  SORT c_custkey, n_name
   69720    115050  INNER JOIN LOOP ON l_orderkey = o_orderkey
   23211     19158  │└INNER JOIN HASH ON c_nationkey = n_nationkey
      25        25   │└TABLE SCAN nation AS nation
   23211     19158   INNER JOIN HASH ON c_custkey = o_custkey
   69633     57474   │└TABLE SCAN orders AS orders WHERE (o_orderdate >= '1994-06-01') AND (o_orderdate < '1994-09-01')
  187500    150000   TABLE SCAN customer AS customer
  172422    114948  TABLE SEEK lineitem AS lineitem WHERE l_returnflag = 'R'
SQL Server
Estimate    Actual  Operator
   34460     38182  SORT Expr1012
   34460     38182  AGGREGATE SUM(Expr1013) AS Expr1012, c_name, c_acctbal, c_phone, c_address, c_comment GROUP BY HASH c_custkey, n_name
   66460    115050  INNER JOIN HASH ON c_nationkey = n_nationkey
      25        25  │└TABLE SCAN nation
   66460    115050  INNER JOIN HASH ON c_custkey = o_custkey
   66460    115050  │└INNER JOIN HASH ON l_orderkey = o_orderkey
   57385     57474   │└TABLE SCAN orders WHERE o_orderdate >= '1994-06-01' AND o_orderdate < '1994-09-01'
  148032    115050   PROJECT l_extendedprice * (1. - l_discount) AS Expr1013
  148032    115050   TABLE SCAN lineitem WHERE l_returnflag = 'R' AND BLOOM(l_orderkey)
   15000     38182  TABLE SCAN customer WHERE BLOOM(c_custkey)

Commentary

Histograms and Estimation

Correct estimation of the filter on o_orderdate is crucial for picking the optimal join order. Without histograms in the statistics, optimizers may mis-estimate the selectivity, which can lead to picking a "bad bushy" join that actually increases the total number of operations and memory usage.

Functional Dependency

By identifying functional dependencies between columns, an optimizer can realize that grouping by c_custkey is sufficient to satisfy the GROUP BY clause. This allows for aggregating the data earlier in the plan, before joining with the customer and nation tables.

Bloom Filters

Query 10 is an ideal candidate for Bloom filters. By building a Bloom filter on the filtered orders table, the engine can significantly reduce the number of rows scanned from the large lineitem table (often by 6x or more), effectively transferring the filter from one table to another during the scan.