PlannerTPC-H — TPCH-Q04

Engine Compare

Accuracy chart, rows processed ?
Scan
Scan
Join Probe
Join
Sort
Sort
Hash Build
Hash
Aggregate
Agg
Distribute
Dist
DuckDB
6,050,773 rows
6.1M
3,974,931 rows
4M
5 rows
5
231,607 rows
232K
47,831 rows
48K
0 rows
0
PostgreSQL
3,843,516 rows
3.8M
52,204 rows
52K
47,832 rows
48K
3,791,312 rows
3.8M
47,852 rows
48K
0 rows
0
SQL Server
3,843,515 rows
3.8M
3,791,311 rows
3.8M
5 rows
5
52,204 rows
52K
47,831 rows
48K
0 rows
0

Actual Query Plans

Query Plan per Engine ?
Query Plan
PostgreSQL
Estimate    Actual  Operator
       5         5  GROUP BY SORT o_orderpriority AGGREGATE COUNT(*)
      20        20  GROUP BY SORT o_orderpriority AGGREGATE PARTIALCOUNT(*)
   20760     47832  SORT o_orderpriority
   20760     47832  LEFT SEMI JOIN HASH ON o_orderkey = l_orderkey
 1999608   3791312  │└TABLE SCAN lineitem WHERE l_commitdate < l_receiptdate
   70248     52204  TABLE SCAN orders WHERE (o_orderdate >= '1995-02-01') AND (o_orderdate < '1995-05-01')
DuckDB
Estimate    Actual  Operator
       5         5  SORT o_orderpriority
       5         5  GROUP BY HASH #0 AGGREGATE count_star()
   60000     47831  PROJECT o_orderpriority
   60000         0  RIGHT SEMI INNER JOIN HASH ON o_orderkey IS NOT DISTINCT FROM o_orderkey
  271669     52204  │└GROUP BY HASH #0 AGGREGATE 
   60000     47831   RIGHT SEMI INNER JOIN HASH ON o_orderkey IS NOT DISTINCT FROM o_orderkey
  271669         0   │└SCAN EMPTY
  449122    131572   PROJECT o_orderkey
  449122    131572   INNER JOIN HASH ON l_orderkey = o_orderkey
  271669         0   │└SCAN MATERIALISED
 1199764   3791155   FILTER l_commitdate < l_receiptdate
 5998820   5998569   TABLE SCAN lineitem
  300000     52204  TABLE SCAN orders WHERE o_orderdate >= '1995-02-01' AND o_orderdate < '1995-05-01'
SQL Server
Estimate    Actual  Operator
       5         5  SORT o_orderpriority
       5         5  PROJECT CONVERT_IMPLICIT int,Expr1009,0 AS Expr1006
       5         5  GROUP BY HASH AGGREGATE COUNT(*) AS Expr1009
   41855     47831  INNER JOIN HASH ON l_orderkey = o_orderkey
   52883     52204  │└TABLE SEEK orders WHERE o_orderdate >= '1995-02-01' AND o_orderdate < '1995-05-01'
 1799650   3791311  TABLE SCAN lineitem

Commentary

## EXIST decorrelation into SEMI JOIN

This expression shoudl be decorrelated:

EXISTS (SELECT *
          FROM tpch.lineitem
         WHERE l_orderkey = o_orderkey
           AND l_commitdate < l_receiptdate)

Optimizers will generally convert this into a SEMI JOIN.

Flipping the inner and outer side of the join (for optimisers that know how to do this) will greatly reduce the amount of memory consumed and it opens up additional opportunities for bloom filtering.