PlannerTPC-H — TPCH-Q11

Engine Compare

Accuracy chart, rows processed ?
Scan
Scan
Seek
Seek
Join Probe
Join
Sort
Sort
Hash Build
Hash
Aggregate
Agg
Distribute
Dist
ClickHouse
Estimation Error
Est Err
Row Operations
Ops
810,001
810K
Rank
Estimation Error
Est Err
Row Operations
Ops
810,000
810K
Rank
Estimation Error
Est Err
Row Operations
Ops
1,225
1.2K
Rank
Estimation Error
Est Err
Row Operations
Ops
30,537
31K
Rank
Estimation Error
Est Err
Row Operations
Ops
30,160
30K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
Databricks
Estimation Error
Est Err
Row Operations
Ops
1,604,219
1.6M
Rank
Estimation Error
Est Err
Row Operations
Ops
1,592,469
1.6M
Rank
Estimation Error
Est Err
Row Operations
Ops
1,225
1.2K
Rank
Estimation Error
Est Err
Row Operations
Ops
66,383
66K
Rank
Estimation Error
Est Err
Row Operations
Ops
88,836
89K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
DuckDB
Estimation Error
Est Err
Row Operations
Ops
870,321
870K
Rank
Estimation Error
Est Err
Row Operations
Ops
838,515
839K
Rank
Estimation Error
Est Err
Row Operations
Ops
1,225
1.2K
Rank
Estimation Error
Est Err
Row Operations
Ops
31,762
32K
Rank
Estimation Error
Est Err
Row Operations
Ops
60,321
60K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
PostgreSQL
Estimation Error
Est Err
Row Operations
Ops
61,076
61K
Rank
Estimation Error
Est Err
Row Operations
Ops
61,074
61K
Rank
Estimation Error
Est Err
Row Operations
Ops
61,074
61K
Rank
Estimation Error
Est Err
Row Operations
Ops
1,225
1.2K
Rank
Estimation Error
Est Err
Row Operations
Ops
61,074
61K
Rank
Estimation Error
Est Err
Row Operations
Ops
60,320
60K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
SQL Server
Estimation Error
Est Err
Row Operations
Ops
61,076
61K
Rank
Estimation Error
Est Err
Row Operations
Ops
62,299
62K
Rank
Estimation Error
Est Err
Row Operations
Ops
1,225
1.2K
Rank
Estimation Error
Est Err
Row Operations
Ops
89,589
90K
Rank
Estimation Error
Est Err
Row Operations
Ops
60,320
60K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank

Actual Query Plans

Query Plan per Engine ?
Query Plan
ClickHouse
Estimate    Actual  Operator
       -      1225  PROJECT ps_partkey, a4 AS v
       -      1225  SORT a4
       -      1225  PROJECT a4, ps_partkey
       -      1225  FILTER (a4 > 7589177.37) AS a2
       -     28515  AGGREGATE SUM(a6) AS a4 GROUP BY HASH ps_partkey
       -     30160  PROJECT ps_partkey, ps_supplycost * ps_availqty AS a6
       -     30160  PROJECT ps_partkey, ps_supplycost, ps_availqty
       -     30160  INNER JOIN HASH ON PROJECTION_304.ps_suppkey = PROJECTION_295.s_suppkey
       -       377  │└PROJECT s_suppkey
       -       377   PROJECT s_suppkey
       -       377   INNER JOIN HASH ON PROJECTION_301.s_nationkey = PROJECTION_298.n_nationkey
       -         1   │└PROJECT n_nationkey
       -         1    PROJECT n_nationkey
       -         1    TABLE SCAN nation WHERE n_name = 'JAPAN'
       -     10000   PROJECT s_nationkey, s_suppkey
       -     10000   PROJECT s_suppkey, s_nationkey
       -     10000   TABLE SCAN supplier
       -    800000  PROJECT ps_suppkey, ps_partkey, ps_supplycost, ps_availqty
       -    800000  PROJECT ps_partkey, ps_supplycost, ps_availqty, ps_suppkey
       -    800000  TABLE SCAN partsupp
DuckDB
Estimate    Actual  Operator
       -      1225  SORT SUM(ps_supplycost * ps_availqty)
      56      1225  PROJECT ps_partkey, v
      56      1225  SEQUENCE
   30769     30160  ├─MATERIALISE AS m___common_subplan_1
   30769     30160   INNER JOIN HASH ON ps_suppkey = s_suppkey
     384       377   │└INNER JOIN HASH ON s_nationkey = n_nationkey
       1         1    │└TABLE SCAN nation WHERE n_name = 'JAPAN'
   10000     10000    TABLE SCAN supplier
  800000    800000   TABLE SCAN partsupp
      56      1225  └─INNER JOIN LOOP ON CAST(SUM(ps_supplycost * CAST(ps_availqty AS DECIMAL(18,0))) AS DECIMAL(38,6)) > SUBQUERY
       1         1    │└PROJECT CASE WHEN (#1 > 1) THEN ("error"('More than one row returned by a subquery used as an expression - scalar subqueries can only return a single row. Use "SET scalar_subquery_error_on_multiple_rows=false" to revert to previous behavior of returning a random row.')) ELSE #0 END
       -         1     AGGREGATE "first" #0, count_star()
       1         1     PROJECT SUM(ps_supplycost * CAST(ps_availqty AS DECIMAL(18,0))) * 0.0001
       -         1     AGGREGATE SUM(#0)
       0     30160     PROJECT ps_supplycost * CAST(ps_availqty AS DECIMAL(18,0))
   30769     30160     SCAN MATERIALISED __common_subplan_1
   23860     28515    AGGREGATE SUM(#1) GROUP BY HASH #0
   30769     30160    PROJECT ps_partkey, ps_supplycost * CAST(ps_availqty AS DECIMAL(18,0))
   30769     30160    SCAN MATERIALISED __common_subplan_1
Databricks
Estimate    Actual  Operator
       1         1  SEQUENCE
    8060      1225  ├─SORT v DESC NULLS LAST
    8060      1225   DISTRIBUTE HASH ON v DESC NULLS LAST
    8060      1225   FILTER 
     400     28515   AGGREGATE SUM(ps_supplycost * CAST(ps_availqty AS DECIMAL(20,0))) GROUP BY HASH ps_partkey
     400     28515   DISTRIBUTE HASH ON ps_partkey
     400     28515   AGGREGATE SUM(ps_supplycost * CAST(ps_availqty AS DECIMAL(20,0))) GROUP BY HASH ps_partkey
     400     30160   INNER JOIN HASH ON s_suppkey = ps_suppkey
     400       377   │└DISTRIBUTE GATHER
   33400       377    INNER JOIN HASH ON s_nationkey = n_nationkey
   10000      6063    │└TABLE SCAN supplier
   33400         1    DISTRIBUTE GATHER
      25         1    TABLE SCAN nation WHERE n_name = 'JAPAN'collate UTF8_BINARY
  800000    796046   TABLE SCAN partsupp
       1         1  └─AGGREGATE SUM(ps_supplycost * CAST(ps_availqty AS DECIMAL(20,0)))
       1         1    DISTRIBUTE GATHER
       1         1    AGGREGATE SUM(ps_supplycost * CAST(ps_availqty AS DECIMAL(20,0)))
   33400     30160    INNER JOIN HASH ON s_suppkey = ps_suppkey
   10000      6063    │└TABLE SCAN supplier
      25    796046    TABLE SCAN partsupp
PostgreSQL
Estimate    Actual  Operator
   10667      1225  SEQUENCE
       1         1  ├─AGGREGATE SUM(ps_supplycost * ps_availqty) * 0.0001
   32000     30160   INNER JOIN LOOP ON ps_suppkey = s_suppkey
     400       377   │└INNER JOIN LOOP ON s_nationkey = n_nationkey
       1         1    │└TABLE SCAN nation AS nation_1 WHERE nation_1.n_name = 'JAPAN'
     400       377    TABLE SEEK supplier AS supplier_1
   30160     30160   TABLE SEEK partsupp AS partsupp_1
   10667      1225  └─SORT SUM(ps_supplycost * ps_availqty)
   10667      1225    AGGREGATE SUM(ps_supplycost * ps_availqty) GROUP BY HASH ps_partkey
   32000     30160    INNER JOIN LOOP ON ps_suppkey = s_suppkey
     400       377    │└INNER JOIN LOOP ON s_nationkey = n_nationkey
       1         1     │└TABLE SCAN nation AS nation WHERE n_name = 'JAPAN'
     400       377     TABLE SEEK supplier AS supplier
   30160     30160    TABLE SEEK partsupp AS partsupp
SQL Server
Estimate    Actual  Operator
    9031      1225  SORT Expr1009
    9031      1225  INNER JOIN LOOP ON n_name = 'JAPAN' AND n_name = 'JAPAN'
   30106     28515  │└AGGREGATE SUM(Expr1021) AS Expr1009 GROUP BY HASH ps_partkey
   32002     30160   INNER JOIN HASH ON ps_suppkey = s_suppkey
     400       377   │└INNER JOIN HASH ON s_nationkey = n_nationkey
       1         1    │└TABLE SCAN nation WHERE n_name = 'JAPAN'
    1000       377    TABLE SCAN supplier WHERE BLOOM(s_nationkey)
   80000     30160   PROJECT ps_supplycost * CONVERT_IMPLICIT(decimal(10,0),ps_availqty,0) AS Expr1021
   80000     30160   TABLE SCAN partsupp WHERE BLOOM(ps_suppkey)
       1         1  PROJECT CASE WHEN Expr1042 = 0 THEN NULL ELSE Expr1043 END AS Expr1019
       1         1  AGGREGATE COUNT(Expr1022) AS Expr1042, SUM(Expr1022) AS Expr1043
   32002     30160  INNER JOIN HASH ON ps_suppkey = s_suppkey
     400       377  │└INNER JOIN HASH ON s_nationkey = n_nationkey
       1         1   │└TABLE SCAN nation WHERE n_name = 'JAPAN'
    1000       377   TABLE SCAN supplier WHERE BLOOM(s_nationkey)
   80000     30160  PROJECT ps_supplycost * CONVERT_IMPLICIT(decimal(10,0),ps_availqty,0) AS Expr1022
   80000     30160  TABLE SCAN partsupp WHERE BLOOM(ps_suppkey)

Commentary

Statistics (Histograms and MCV)

To pick the optimal bushy join order in Query 11, the optimizer needs accurate statistics like histograms and Most Common Values (MCV). These allow it to know that a filter on n_name = 'JAPAN' will result in a very small number of matching rows in the supplier table.

Bushy Join

Query 11 benefits from a bushy join tree: partsupp ⨝ (suppliernation). Since the filtered result of supplier ⨝ nation is tiny, the hash table built for this join is very small, making this more memory-efficient and faster than a traditional left-deep tree.

Canonical Forms vs. Init Plans

How a database handles non-correlated subqueries (like the one in the HAVING clause) affects the simplicity of its execution engine. Some databases use "init plans" to execute the subquery first, while others canonicalize the subquery into a simple cross-join to a single row.