PlannerTPC-H — TPCH-Q05

Engine Compare

Accuracy chart, rows processed ?
Scan
Scan
Join Probe
Join
Sort
Sort
Hash Build
Hash
Aggregate
Agg
Distribute
Dist
ClickHouse
Estimation Error
Est Err
Row Operations
Ops
6,373,251
6.4M
Rank
Estimation Error
Est Err
Row Operations
Ops
7,608,212
7.6M
Rank
Estimation Error
Est Err
Row Operations
Ops
5
5
Rank
Estimation Error
Est Err
Row Operations
Ops
1,615,345
1.6M
Rank
Estimation Error
Est Err
Row Operations
Ops
5,953
6K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
DuckDB
Estimation Error
Est Err
Row Operations
Ops
6,330,085
6.3M
Rank
Estimation Error
Est Err
Row Operations
Ops
6,491,163
6.5M
Rank
Estimation Error
Est Err
Row Operations
Ops
5
5
Rank
Estimation Error
Est Err
Row Operations
Ops
254,028
254K
Rank
Estimation Error
Est Err
Row Operations
Ops
6,988
7K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
PostgreSQL
Estimation Error
Est Err
Row Operations
Ops
6,372,861
6.4M
Rank
Estimation Error
Est Err
Row Operations
Ops
7,175,985
7.2M
Rank
Estimation Error
Est Err
Row Operations
Ops
6,995
7K
Rank
Estimation Error
Est Err
Row Operations
Ops
1,067,198
1.1M
Rank
Estimation Error
Est Err
Row Operations
Ops
7,015
7K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
SQL Server
Estimation Error
Est Err
Row Operations
Ops
545,134
545K
Rank
Estimation Error
Est Err
Row Operations
Ops
434,997
435K
Rank
Estimation Error
Est Err
Row Operations
Ops
5
5
Rank
Estimation Error
Est Err
Row Operations
Ops
373,992
374K
Rank
Estimation Error
Est Err
Row Operations
Ops
6,993
7K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank

Actual Query Plans

Query Plan per Engine ?
Query Plan
ClickHouse
Estimate    Actual  Operator
       -         5  PROJECT SUM(l_extendedprice * (1 - l_discount))
       -         5  SORT SUM(l_extendedprice * (1 - l_discount))
       -         5  GROUP BY HASH n_name AGGREGATE SUM(l_extendedprice * (1 - l_discount))
       -      5953  PROJECT 1 - l_discount, l_extendedprice * (1 - l_discount)
       -      5953  INNER JOIN HASH ON n_regionkey = r_regionkey
       -         1  │└TABLE SCAN region WHERE r_name = 'EUROPE'
       -     30728  INNER JOIN HASH ON s_nationkey = n_nationkey
       -        25  │└TABLE SCAN nation
       -     30728  PROJECT s_nationkey = c_nationkey
       -     30728  INNER JOIN HASH ON (l_suppkey,c_nationkey) = (s_suppkey,s_nationkey)
       -     10000  │└TABLE SCAN supplier
       -    773968  INNER JOIN HASH ON o_custkey = c_custkey
       -    150000  │└TABLE SCAN customer
       -    773968  INNER JOIN HASH ON l_orderkey = o_orderkey
       -    214405  │└TABLE SCAN orders WHERE (o_orderdate >= '1995-01-01') AND (o_orderdate < '1996-01-01')
       -   5998820  TABLE SCAN lineitem
DuckDB
Estimate    Actual  Operator
       -         5  SORT SUM(l_extendedprice * (1 - l_discount))
  272721         5  GROUP BY HASH #0 AGGREGATE SUM(#1)
  287133      6988  PROJECT n_name, l_extendedprice * (1.000 - l_discount)
  287133      6988  INNER JOIN HASH ON l_suppkey = s_suppkeyn_nationkey = s_nationkey
   10000     10000  │└TABLE SCAN supplier
  246102    171079  INNER JOIN HASH ON l_orderkey = o_orderkey
   60992     42896  │└INNER JOIN HASH ON o_custkey = c_custkey
   28846     30048   │└INNER JOIN HASH ON c_nationkey = n_nationkey
       5         5    │└INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1     │└TABLE SCAN region WHERE r_name = 'EUROPE'
      25         5     TABLE SCAN nation
  150000    108053    TABLE SCAN customer WHERE c_custkey >= 3
  300000    213997   TABLE SCAN orders WHERE o_orderdate >= '1995-01-01' AND o_orderdate < '1996-01-01'
 5998820   5998029  TABLE SCAN lineitem WHERE l_suppkey >= 1
PostgreSQL
Estimate    Actual  Operator
      25         5  SORT SUM(l_extendedprice * ('1' - l_discount))
      25         5  GROUP BY SORT n_name AGGREGATE SUM(l_extendedprice * ('1' - l_discount))
     125        25  GROUP BY SORT n_name AGGREGATE PARTIALSUM(l_extendedprice * ('1' - l_discount))
    8410      6990  SORT n_name
    8410      6990  INNER JOIN HASH ON (l_suppkey = s_suppkey) AND (n_nationkey = s_nationkey)
   10000     10000  │└TABLE SCAN supplier
  210835    171090  INNER JOIN HASH ON o_custkey = c_custkey
   25000     30048  │└INNER JOIN HASH ON c_nationkey = n_nationkey
      10        10   │└INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1    │└TABLE SCAN region WHERE r_name = 'EUROPE'
      25        25    TABLE SCAN nation
  125000    150000   TABLE SCAN customer
 1054170    856050  INNER JOIN HASH ON l_orderkey = o_orderkey
  340125    214015  │└TABLE SCAN orders WHERE (o_orderdate >= '1995-01-01') AND (o_orderdate < '1996-01-01')
 7498525   5998820  TABLE SCAN lineitem
SQL Server
Estimate    Actual  Operator
       5         5  SORT Expr1013
       5         5  PROJECT CASE WHEN globalagg1023 = 0 THEN NULL ELSE globalagg1025 END AS Expr1013
       5         5  GROUP BY HASH AGGREGATE SUM(partialagg1022) AS globalagg1023, SUM(partialagg1024) AS globalagg1025
       5         5  INNER JOIN HASH ON r_regionkey = n_regionkey
       1         1  │└TABLE SEEK region WHERE r_name = 'EUROPE'
       2         5  GROUP BY HASH AGGREGATE COUNT(Expr1014) AS partialagg1022, SUM(Expr1014) AS partialagg1024
    3890      6988  INNER JOIN HASH ON n_nationkey = s_nationkey AND s_suppkey = l_suppkey
   10000     10000  │└TABLE SEEK supplier
   97261      6988  FILTER BLOOM(n_nationkey,l_suppkey)
   97261    171091  INNER JOIN HASH ON o_orderkey = l_orderkey
   21431     42896  │└INNER JOIN HASH ON c_nationkey = n_nationkey
      25        25   │└TABLE SEEK nation
   21431     42896   INNER JOIN HASH ON c_custkey = o_custkey
  150000    150000   │└TABLE SEEK customer
   21431    214017   TABLE SEEK orders WHERE o_orderdate >= '1995-01-01' AND o_orderdate < '1996-01-01'
   59988    171091  PROJECT l_extendedprice * (1. - l_discount) AS Expr1014
   59988    171091  TABLE SEEK lineitem WHERE BLOOM(l_orderkey)

Commentary



This filter on o_orderdate

This is a quick test for partitioning elimination.

We don't have to join via supplier to harvest the filter - we can go via customer and filter that first. customer is a much larger table than supplier (by 15x) so it is better to apply an aggressive filter on a larger table instead of a smaller one.

Since orders is already filtered, we can also reduce it further by pre-joining to customer, nation and region of to orders before we join to lineitem. If we do this, we still have the filter on c_nationkey = s_nationkey later in the query. Even though we took the filter early on customer we must still apply the filter to supplier later.

By combining the filters r_name = 'ASIA' and o_orderdate >= '1994-01-01' AND o_orderdate < '1995-01-01' in this way, we can greatly reduce the work needed in the query.