PlannerTPC-H — TPCH-Q03

Engine Compare

Accuracy chart, rows processed ?
Scan
Scan
Join Probe
Join
Sort
Sort
Hash Build
Hash
Aggregate
Agg
Distribute
Dist
DuckDB
4,111,077 rows
4.1M
4,080,841 rows
4.1M
10 rows
10
276,724 rows
277K
28,114 rows
28K
0 rows
0
PostgreSQL
4,111,171 rows
4.1M
4,080,935 rows
4.1M
28,125 rows
28K
276,720 rows
277K
28,115 rows
28K
0 rows
0
SQL Server
4,111,171 rows
4.1M
4,080,935 rows
4.1M
10 rows
10
276,724 rows
277K
28,114 rows
28K
0 rows
0

Actual Query Plans

Query Plan per Engine ?
Query Plan
PostgreSQL
Estimate    Actual  Operator
      10        10  LIMIT 10
  311826        10  SORT SUM(l_extendedprice * ('1' - l_discount)), o_orderdate
  311826     10680  GROUP BY SORT l_orderkey, o_orderdate, o_shippriority AGGREGATE SUM(l_extendedprice * ('1' - l_discount))
  389780     28115  SORT l_orderkey, o_orderdate, o_shippriority
  389780     28115  INNER JOIN HASH ON l_orderkey = o_orderkey
  223220    138360  │└INNER JOIN HASH ON o_custkey = c_custkey
   12479     30236   │└TABLE SCAN customer WHERE c_mktsegment = 'MACHINERY'
 1117955    686085   TABLE SCAN orders WHERE o_orderdate < '1995-03-15'
 4224670   3394850  TABLE SCAN lineitem WHERE l_shipdate > '1995-03-15'
DuckDB
Estimate    Actual  Operator
      10        10  LIMIT 10
      10        10  SORT SUM(l_extendedprice * (1 - l_discount)), o_orderdate
   63986     10680  PROJECT l_orderkey, revenue, o_orderdate, o_shippriority
   63986     10680  GROUP BY HASH #0, #1, #2 AGGREGATE SUM(#3)
   63986     28114  PROJECT l_orderkey, o_orderdate, o_shippriority, l_extendedprice * (1.000 - l_discount)
   63986     28114  INNER JOIN HASH ON l_orderkey = o_orderkey
   79290    138362  │└INNER JOIN HASH ON o_custkey = c_custkey
   37500     30236   │└FILTER c_custkey >= 3
   37500     30236    TABLE SCAN customer WHERE c_mktsegment = 'MACHINERY'
  300000    686071   TABLE SCAN orders WHERE o_orderdate < '1995-03-15'
 1199764   3394770  TABLE SCAN lineitem WHERE l_shipdate > '1995-03-15'
SQL Server
Estimate    Actual  Operator
      10        10  SORT Expr1007, o_orderdate
  352852     10680  PROJECT CASE WHEN Expr1013 = 0 THEN NULL ELSE Expr1014 END AS Expr1007
  352852     10680  GROUP BY HASH AGGREGATE COUNT(Expr1008) AS Expr1013, SUM(Expr1008) AS Expr1014, IN(o_orderdate) AS o_orderdate, IN(o_shippriority) AS o_shippriority
 1047040     28114  INNER JOIN HASH ON o_orderkey = l_orderkey
  410020    138362  │└INNER JOIN HASH ON c_custkey = o_custkey
   30089     30236   │└TABLE SEEK customer WHERE c_mktsegment = 'MACHINERY'
  684664    686085   TABLE SEEK orders WHERE o_orderdate < '1995-03-15'
 3394880   3394850  PROJECT l_extendedprice * (1. - l_discount) AS Expr1008
 3394880   3394850  TABLE SCAN lineitem

Commentary



Optimal Join Order

Optimisers only seeking left deep trees will consume signficant memory while executing the query.

The key optimisation in join ordering is to create a bushy join on customer and orders before joining to lineitem. This allows both l_shipdate > '1995-03-15' and o_orderdate < '1995-03-15' to be be pre-filtered before the join to lineitem greatly reducing the amount of data to be processed.

Optimisers who understand bloom filters will be able to harvest further filters by combining the customer and orders filters before accessing lineitem.

Heap Sort

The LIMIT 10 in the query is best optimised with heap sort nodes. This is typically done with a special sort operator that can cut the stream early. Good optimisers and execution engines find this optimisation and reduce the amount of data to be sorted before emitting the top 10.

In the dbprove tool, this shows up a the sort not only emitting a few rows.