PlannerTPC-H — TPCH-Q17

Engine Compare

Accuracy chart, rows processed ?
Scan
Scan
Join Probe
Join
Sort
Sort
Hash Build
Hash
Aggregate
Agg
Distribute
Dist
DuckDB
11,841,430 rows
12M
11,853,388 rows
12M
0 rows
0
18,442 rows
18K
6,544 rows
6.5K
0 rows
0
PostgreSQL
6,187,501 rows
6.2M
5,998,820 rows
6M
0 rows
0
6,080 rows
6.1K
188,944 rows
189K
0 rows
0
SQL Server
12,363 rows
12K
928 rows
928
0 rows
0
6,544 rows
6.5K
6,544 rows
6.5K
0 rows
0

Actual Query Plans

Query Plan per Engine ?
Query Plan
PostgreSQL
Estimate    Actual  Operator
       1         1  GROUP BY SIMPLE AGGREGATE SUM(l_extendedprice) / 7.0
    1930       464  INNER JOIN HASH ON l_partkey = p_partkey
    6080      6080  │└GROUP BY SIMPLE AGGREGATE 0.2 * avg l_quantity
  188480    188480   TABLE SEEK li
     240       201   TABLE SCAN part WHERE (p_brand = 'Brand#13') AND (p_container = 'MED CAN')
 5998820   5998820  TABLE SCAN lineitem
DuckDB
Estimate    Actual  Operator
       1         1  PROJECT avg_yearly
  162389         1  GROUP BY SIMPLE AGGREGATE SUM(#0)
  162389       464  PROJECT l_extendedprice
  162389       464  FILTER CAST(l_quantity AS DOUBLE) < SUBQUERY
       0         0  RIGHT SEMI INNER JOIN HASH ON p_partkey IS NOT DISTINCT FROM p_partkey
  109375       202  │└GROUP BY HASH #0 AGGREGATE 
       0      6080   RIGHT OUTER JOIN HASH ON p_partkey IS NOT DISTINCT FROM p_partkey
  109375         0   │└SCAN EMPTY
 1717800       202   PROJECT 0.2 * avg l_quantity, p_partkey
 1717800       202   GROUP BY HASH #0 AGGREGATE avg #1
 3435601      6080   PROJECT p_partkey, l_quantity
 3435601      6080   INNER JOIN HASH ON l_partkey = p_partkey
  109375         0   │└SCAN MATERIALISED
 5998820   5920614   TABLE SCAN lineitem
  162389      6080  INNER JOIN HASH ON l_partkey = p_partkey
    5406       202  │└TABLE SCAN part WHERE p_brand = 'Brand#13' AND p_container = 'MED CAN'
 5998820   5920614  TABLE SCAN lineitem
SQL Server
Estimate    Actual  Operator
       1         ∞  PROJECT Expr1009 / 7.0 AS Expr1010
       1         1  PROJECT CASE WHEN Expr1022 = 0 THEN NULL ELSE Expr1023 END AS Expr1009
       1         1  GROUP BY HASH AGGREGATE COUNT(l_extendedprice) AS Expr1022, SUM(l_extendedprice) AS Expr1023
 5998820       464  INNER JOIN LOOP ON l_quantity < 0.2 * Expr1007
      30       464  │└INNER JOIN LOOP ON 
      30      6080   │└TABLE SCAN lineitem
       1       202   PROJECT CASE WHEN Expr1020 = 0 THEN NULL ELSE Expr1021 / CONVERT_IMPLICIT(decimal 19,0 ,Expr1020,0) END AS Expr1007
       1       202   GROUP BY HASH AGGREGATE COUNT(l_quantity) AS Expr1020, SUM(l_quantity) AS Expr1021
      30      6080   TABLE SCAN lineitem
    1031       203  TABLE SCAN lineitem

Commentary



The Correlated Subquery

Consider this expression:

l_quantity < (SELECT 0.2 * AVG(l_quantity)
                    FROM tpch.lineitem
                    WHERE l_partkey = p_partkey);

The crucial optimisation is for the optimiser to rewrite inoto this:

SELECT SUM(l_extendedprice) / 7.0 AS avg_yearly
FROM tpch.lineitem
INNER JOIN tpch.part
    ON l_partkey = p_partkey
INNER JOIN (SELECT l_partkey
                 , 0.2 * AVG(l_quantity) AS avg_quantity
            FROM tpch.lineitem
            GROUP BY l_partkey) AS decorrelated
    ON decorrelated.l_partkey = p_partkey
WHERE p_brand = 'Brand#13'
  AND p_container = 'MED CAN'
  AND l_quantity < decorrelated.avg_quantityh;

That allows much higher speed hash joins.

Agggressive pushdown

It is possible to push down the highly selective filters on p_brand and p_container into the subquery, further reducing the amount of data processed.