PlannerTPC-H — TPCH-Q05

Engine Compare

Accuracy chart, rows processed ?
Scan
Scan
Seek
Seek
Join Probe
Join
Sort
Sort
Hash Build
Hash
Aggregate
Agg
Distribute
Dist
ClickHouse
Estimation Error
Est Err
Row Operations
Ops
6,389,878
6.4M
Rank
Estimation Error
Est Err
Row Operations
Ops
6,389,877
6.4M
Rank
Estimation Error
Est Err
Row Operations
Ops
5
5
Rank
Estimation Error
Est Err
Row Operations
Ops
2,567,411
2.6M
Rank
Estimation Error
Est Err
Row Operations
Ops
7,403
7.4K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
Databricks
Estimation Error
Est Err
Row Operations
Ops
6,379,570
6.4M
Rank
Estimation Error
Est Err
Row Operations
Ops
6,236,972
6.2M
Rank
Estimation Error
Est Err
Row Operations
Ops
5
5
Rank
Estimation Error
Est Err
Row Operations
Ops
2,058,483
2.1M
Rank
Estimation Error
Est Err
Row Operations
Ops
7,413
7.4K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
DuckDB
Estimation Error
Est Err
Row Operations
Ops
585,359
585K
Rank
Estimation Error
Est Err
Row Operations
Ops
759,742
760K
Rank
Estimation Error
Est Err
Row Operations
Ops
5
5
Rank
Estimation Error
Est Err
Row Operations
Ops
270,783
271K
Rank
Estimation Error
Est Err
Row Operations
Ops
7,403
7.4K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
PostgreSQL
Estimation Error
Est Err
Row Operations
Ops
390,553
391K
Rank
Estimation Error
Est Err
Row Operations
Ops
230,527
231K
Rank
Estimation Error
Est Err
Row Operations
Ops
564,994
565K
Rank
Estimation Error
Est Err
Row Operations
Ops
7,408
7.4K
Rank
Estimation Error
Est Err
Row Operations
Ops
250,651
251K
Rank
Estimation Error
Est Err
Row Operations
Ops
7,418
7.4K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
SQL Server
Estimation Error
Est Err
Row Operations
Ops
562,535
563K
Rank
Estimation Error
Est Err
Row Operations
Ops
386,019
386K
Rank
Estimation Error
Est Err
Row Operations
Ops
10
10
Rank
Estimation Error
Est Err
Row Operations
Ops
575,348
575K
Rank
Estimation Error
Est Err
Row Operations
Ops
7,408
7.4K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank

Actual Query Plans

Query Plan per Engine ?
Query Plan
ClickHouse
Estimate    Actual  Operator
       -         5  PROJECT n_name, a1 AS revenue
       -         5  SORT a1
       -         5  PROJECT a1, n_name
       -         5  AGGREGATE SUM(a5) AS a1 GROUP BY HASH n_name
       -      7403  PROJECT n_name, l_extendedprice * (1 - l_discount) AS a5
       -      7403  PROJECT l_extendedprice, l_discount, n_name
       -      7403  INNER JOIN HASH ON tuple(PROJECTION_130.c_custkey,PROJECTION_130.c_nationkey) = tuple(PROJECTION_103.o_custkey,PROJECTION_103.s_nationkey)
       -    181905  │└PROJECT o_custkey, s_nationkey, l_extendedprice, l_discount, n_name
       -    181905   PROJECT l_extendedprice, l_discount, o_custkey, s_nationkey, n_name
       -    181905   INNER JOIN HASH ON PROJECTION_127.o_orderkey = PROJECTION_106.l_orderkey
       -   1191757   │└PROJECT l_orderkey, l_extendedprice, l_discount, s_nationkey, n_name
       -   1191757    PROJECT l_extendedprice, l_discount, l_orderkey, s_nationkey, n_name
       -   1191757    INNER JOIN HASH ON PROJECTION_124.l_suppkey = PROJECTION_109.s_suppkey
       -      1987    │└PROJECT s_suppkey, s_nationkey, n_name
       -      1987     PROJECT s_suppkey, s_nationkey, n_name
       -      1987     INNER JOIN HASH ON PROJECTION_121.s_nationkey = PROJECTION_112.n_nationkey
       -         5     │└PROJECT n_nationkey, n_name
       -         5      PROJECT n_name, n_nationkey
       -         5      INNER JOIN HASH ON PROJECTION_118.n_regionkey = PROJECTION_115.r_regionkey
       -         1      │└PROJECT r_regionkey
       -         1       PROJECT r_regionkey
       -         1       TABLE SCAN region WHERE r_name = 'EUROPE'
       -        25      PROJECT n_regionkey, n_name, n_nationkey
       -        25      PROJECT n_name, n_nationkey, n_regionkey
       -        25      TABLE SCAN nation
       -     10000     PROJECT s_nationkey, s_suppkey
       -     10000     PROJECT s_suppkey, s_nationkey
       -     10000     TABLE SCAN supplier
       -   6001215    PROJECT l_suppkey, l_extendedprice, l_discount, l_orderkey
       -   6001215    PROJECT l_extendedprice, l_discount, l_orderkey, l_suppkey
       -   6001215    TABLE SCAN lineitem
       -    228637   PROJECT o_orderkey, o_custkey
       -    228637   PROJECT o_orderkey, o_custkey
       -    228637   TABLE SCAN orders WHERE (o_orderdate >= '1995-01-01') AND (o_orderdate < '1996-01-01')
       -    150000  PROJECT c_custkey, c_nationkey
       -    150000  PROJECT c_custkey, c_nationkey
       -    150000  TABLE SCAN customer
DuckDB
Estimate    Actual  Operator
       -         5  SORT SUM(l_extendedprice * (1 - l_discount))
  239028         5  AGGREGATE SUM(#1) GROUP BY HASH #0
  245638      7403  PROJECT n_name, l_extendedprice * (1.00 - l_discount)
  245638      7403  INNER JOIN HASH ON l_suppkey = s_suppkey AND n_nationkey = s_nationkey
   10000     10000  │└TABLE SCAN supplier
  245638    184384  INNER JOIN HASH ON l_orderkey = o_orderkey
   60888     46197  │└INNER JOIN HASH ON o_custkey = c_custkey
   28846     30197   │└INNER JOIN HASH ON c_nationkey = n_nationkey
       5         5    │└INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1     │└TABLE SCAN region WHERE r_name = 'EUROPE'
      25        25     TABLE SCAN nation
  150000    149999    TABLE SCAN customer WHERE c_custkey <= 149999
  300000    228626   TABLE SCAN orders WHERE o_orderdate >= '1995-01-01' AND o_orderdate < '1996-01-01'
 6001215    196708  TABLE SCAN lineitem
Databricks
Estimate    Actual  Operator
       5         5  SORT revenue DESC NULLS LAST
       5         5  AGGREGATE SUM(l_extendedprice * (1BD - l_discount)) GROUP BY HASH n_name
       5        10  DISTRIBUTE HASH ON n_name
       5        10  AGGREGATE SUM(l_extendedprice * (1BD - l_discount)) GROUP BY HASH n_name
  922000      7403  INNER JOIN HASH ON l_suppkey = s_suppkey
  922000    913927  │└INNER JOIN HASH ON o_orderkey = l_orderkey
  922000    228637   │└DISTRIBUTE GATHER
    2000    228637    INNER JOIN HASH ON o_custkey = c_custkey
 6000000    228637    │└TABLE SCAN orders WHERE (o_orderdate >= DATE'1995-01-01') AND (o_orderdate < DATE'1996-01-01')
    2000    150000    DISTRIBUTE GATHER
      25    150000    TABLE SCAN customer
 1500000   5994223   TABLE SCAN lineitem
  922000      1987  DISTRIBUTE GATHER
  228000      1987  INNER JOIN HASH ON n_nationkey = s_nationkey
  228000         5  │└DISTRIBUTE GATHER
       5         5   INNER JOIN HASH ON n_regionkey = r_regionkey
       5         5   │└TABLE SCAN nation
       5         1   DISTRIBUTE GATHER
   10000         1   TABLE SCAN region WHERE r_name = 'EUROPE'collate UTF8_BINARY
  150000      6704  TABLE SCAN supplier
PostgreSQL
Estimate    Actual  Operator
      25         5  SORT SUM(l_extendedprice * ('1' - l_discount))
      25         5  AGGREGATE SUM(l_extendedprice * ('1' - l_discount)) GROUP BY SORT n_name
      75        15  AGGREGATE PARTIAL SUM(l_extendedprice * ('1' - l_discount)) GROUP BY SORT n_name
    9123      7403  SORT n_name
    3041      2467  INNER JOIN HASH ON (l_suppkey = s_suppkey) AND (n_nationkey = s_nationkey)
   10000     10000  │└TABLE SCAN supplier AS supplier
  228108    184383  INNER JOIN LOOP ON l_orderkey = o_orderkey
   57015     46197  │└INNER JOIN LOOP ON o_custkey = c_custkey
   12500     10065   │└INNER JOIN HASH ON c_nationkey = n_nationkey
       5         5    │└INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1     │└TABLE SCAN region AS region WHERE r_name = 'EUROPE'
      25        25     TABLE SCAN nation AS nation
  187500    150000    TABLE SCAN customer AS customer
   90591     46201   TABLE SEEK orders AS orders WHERE (o_orderdate >= '1995-01-01') AND (o_orderdate < '1996-01-01')
  646758    184326  TABLE SEEK lineitem AS lineitem
SQL Server
Estimate    Actual  Operator
       5         5  SORT Expr1018
       5         5  AGGREGATE SUM(partialagg1027) AS Expr1018 GROUP BY SORT n_name
      25         5  SORT n_name
      25         5  AGGREGATE SUM(Expr1019) AS partialagg1027 GROUP BY HASH n_name
    8550      7403  INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1  │└TABLE SCAN region WHERE r_name = 'EUROPE'
    4275      7403  INNER JOIN HASH ON s_nationkey = n_nationkey
       2         5  │└TABLE SCAN nation WHERE BLOOM(n_regionkey)
    4275      7403  INNER JOIN HASH ON c_custkey = o_custkey AND s_nationkey = c_nationkey
  150000    150000  │└TABLE SCAN customer
  106865      7403  FILTER BLOOM(o_custkey,s_nationkey)
  106865    181905  INNER JOIN HASH ON l_orderkey = o_orderkey
  227610    228637  │└TABLE SCAN orders WHERE o_orderdate >= '1995-01-01' AND o_orderdate < '1996-01-01'
   60012    181905  INNER JOIN HASH ON l_suppkey = s_suppkey
    1000      1987  │└TABLE SCAN supplier WHERE BLOOM(s_nationkey)
   60012    181905  PROJECT l_extendedprice * (1. - l_discount) AS Expr1019
   60012    181905  TABLE SCAN lineitem WHERE BLOOM(l_suppkey) AND BLOOM(l_orderkey)

Commentary

This filter on o_orderdate

This is a quick test for partitioning elimination.

We don't have to join via supplier to harvest the filter - we can go via customer and filter that first. customer is a much larger table than supplier (by 15x) so it is better to apply an aggressive filter on a larger table instead of a smaller one.

Since orders is already filtered, we can also reduce it further by pre-joining to customer, nation and region of to orders before we join to lineitem. If we do this, we still have the filter on c_nationkey = s_nationkey later in the query. Even though we took the filter early on customer we must still apply the filter to supplier later.

By combining the filters r_name = 'ASIA' and o_orderdate >= '1994-01-01' AND o_orderdate < '1995-01-01' in this way, we can greatly reduce the work needed in the query.