PlannerTPC-H — TPCH-Q05

Engine Compare

Accuracy chart, rows processed ?
Scan
Scan
Join Probe
Join
Sort
Sort
Hash Build
Hash
Aggregate
Agg
Distribute
Dist
DuckDB
6,330,085 rows
6.3M
6,491,163 rows
6.5M
5 rows
5
254,028 rows
254K
6,988 rows
7K
0 rows
0
PostgreSQL
6,372,861 rows
6.4M
7,175,985 rows
7.2M
6,995 rows
7K
1,067,194 rows
1.1M
7,015 rows
7K
0 rows
0
SQL Server
6,372,867 rows
6.4M
6,533,953 rows
6.5M
5 rows
5
254,041 rows
254K
6,988 rows
7K
0 rows
0

Actual Query Plans

Query Plan per Engine ?
Query Plan
PostgreSQL
Estimate    Actual  Operator
      25         5  SORT SUM(l_extendedprice * ('1' - l_discount))
      25         5  GROUP BY SORT n_name AGGREGATE SUM(l_extendedprice * ('1' - l_discount))
     125        25  GROUP BY SORT n_name AGGREGATE PARTIALSUM(l_extendedprice * ('1' - l_discount))
    8410      6990  SORT n_name
    8410      6990  INNER JOIN HASH ON (l_suppkey = s_suppkey) AND (n_nationkey = s_nationkey)
   10000     10000  │└TABLE SCAN supplier
  210835    171090  INNER JOIN HASH ON o_custkey = c_custkey
   12500     30049  │└INNER JOIN HASH ON c_nationkey = n_nationkey
       5         5   │└INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1    │└TABLE SCAN region WHERE r_name = 'EUROPE'
      25        25    TABLE SCAN nation
   62500    150000   TABLE SCAN customer
 1054170    856050  INNER JOIN HASH ON l_orderkey = o_orderkey
  340125    214015  │└TABLE SCAN orders WHERE (o_orderdate >= '1995-01-01') AND (o_orderdate < '1996-01-01')
 7498525   5998820  TABLE SCAN lineitem
DuckDB
Estimate    Actual  Operator
  272721         5  SORT SUM(l_extendedprice * (1 - l_discount))
  272721         5  GROUP BY HASH #0 AGGREGATE SUM(#1)
  287133      6988  PROJECT n_name, l_extendedprice * (1.000 - l_discount)
  287133      6988  INNER JOIN HASH ON l_suppkey = s_suppkeyn_nationkey = s_nationkey
   10000     10000  │└TABLE SCAN supplier
  246102    171079  INNER JOIN HASH ON l_orderkey = o_orderkey
   60992     42896  │└INNER JOIN HASH ON o_custkey = c_custkey
   28846     30048   │└INNER JOIN HASH ON c_nationkey = n_nationkey
       5         5    │└INNER JOIN HASH ON n_regionkey = r_regionkey
       1         1     │└TABLE SCAN region WHERE r_name = 'EUROPE'
      25         5     TABLE SCAN nation
  150000    108053    TABLE SCAN customer WHERE c_custkey >= 3
  300000    213997   TABLE SCAN orders WHERE o_orderdate >= '1995-01-01' AND o_orderdate < '1996-01-01'
 5998820   5998029  TABLE SCAN lineitem WHERE l_suppkey >= 1
SQL Server
Estimate    Actual  Operator
       5         5  SORT Expr1013
       5         5  PROJECT CASE WHEN Expr1019 = 0 THEN NULL ELSE Expr1020 END AS Expr1013
       5         5  GROUP BY HASH AGGREGATE COUNT(Expr1014) AS Expr1019, SUM(Expr1014) AS Expr1020
    7720      6988  INNER JOIN HASH ON s_suppkey = l_suppkey AND n_nationkey = s_nationkey
   10000     10000  │└TABLE SEEK supplier
  193037    171091  INNER JOIN HASH ON o_orderkey = l_orderkey
   42779     42896  │└INNER JOIN HASH ON c_custkey = o_custkey
   30000     30049   │└INNER JOIN HASH ON c_nationkey = n_nationkey
       5         5    │└INNER JOIN LOOP ON n_regionkey = r_regionkey
       1         5     │└TABLE SEEK region WHERE r_name = 'EUROPE'
      25        25     TABLE SEEK nation
  150000    150000    TABLE SEEK customer
  214097    214017   TABLE SEEK orders WHERE o_orderdate >= '1995-01-01' AND o_orderdate < '1996-01-01'
 5998820   5998820  PROJECT l_extendedprice * (1. - l_discount) AS Expr1014
 5998820   5998820  TABLE SCAN lineitem

Commentary



This filter on o_orderdate

This is a quick test for partitioning elimination.

We don't have to join via supplier to harvest the filter - we can go via customer and filter that first. customer is a much larger table than supplier (by 15x) so it is better to apply an aggressive filter on a larger table instead of a smaller one.

Since orders is already filtered, we can also reduce it further by pre-joining to customer, nation and region of to orders before we join to lineitem. If we do this, we still have the filter on c_nationkey = s_nationkey later in the query. Even though we took the filter early on customer we must still apply the filter to supplier later.

By combining the filters r_name = 'ASIA' and o_orderdate >= '1994-01-01' AND o_orderdate < '1995-01-01' in this way, we can greatly reduce the work needed in the query.