PlannerTPC-H — TPCH-Q17

Engine Compare

Accuracy chart, rows processed ?
Scan
Scan
Seek
Seek
Join Probe
Join
Sort
Sort
Hash Build
Hash
Aggregate
Agg
Distribute
Dist
ClickHouse
Estimation Error
Est Err
Row Operations
Ops
12,002,628
12M
Rank
Estimation Error
Est Err
Row Operations
Ops
206,029
206K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
Estimation Error
Est Err
Row Operations
Ops
6,001,413
6M
Rank
Estimation Error
Est Err
Row Operations
Ops
6,001,752
6M
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
Databricks
Estimation Error
Est Err
Row Operations
Ops
12,002,826
12M
Rank
Estimation Error
Est Err
Row Operations
Ops
265,413
265K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
Estimation Error
Est Err
Row Operations
Ops
735
735
Rank
Estimation Error
Est Err
Row Operations
Ops
6,401,948
6.4M
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
DuckDB
Estimation Error
Est Err
Row Operations
Ops
11,995,356
12M
Rank
Estimation Error
Est Err
Row Operations
Ops
12,007,216
12M
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
Estimation Error
Est Err
Row Operations
Ops
18,285
18K
Rank
Estimation Error
Est Err
Row Operations
Ops
6,566
6.6K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
PostgreSQL
Estimation Error
Est Err
Row Operations
Ops
6,191,386
6.2M
Rank
Estimation Error
Est Err
Row Operations
Ops
189,973
190K
Rank
Estimation Error
Est Err
Row Operations
Ops
6,001,215
6M
Rank
Estimation Error
Est Err
Row Operations
Ops
0
0
Rank
Estimation Error
Est Err
Row Operations
Ops
537
537
Rank
Estimation Error
Est Err
Row Operations
Ops
190,510
191K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank
SQL Server
Estimation Error
Est Err
Row Operations
Ops
18,285
18K
Rank
Estimation Error
Est Err
Row Operations
Ops
6,566
6.6K
Rank
Estimation Error
Est Err
Row Operations
Ops
6,029
6K
Rank
Estimation Error
Est Err
Row Operations
Ops
12,058
12K
Rank
Estimation Error
Est Err
Row Operations
Ops
12,595
13K
Rank
Estimation Error
Est Err
Row Operations
Ops
0
Rank

Actual Query Plans

Query Plan per Engine ?
Query Plan
ClickHouse
Estimate    Actual  Operator
       -         1  PROJECT a2 / 7. AS avg_yearly
       -         1  AGGREGATE SUM(l_extendedprice) AS a2
       -       537  PROJECT l_extendedprice
       -       537  FILTER l_quantity < a8
       -      6029  PROJECT a8, l_extendedprice, l_quantity
       -      6029  INNER JOIN HASH ON PROJECTION_386.p_partkey = PROJECTION_383.l_partkey
       -   6001215  │└PROJECT l_partkey, l_extendedprice, l_quantity
       -   6001215   PROJECT l_extendedprice, l_partkey, l_quantity
       -   6001215   TABLE SCAN lineitem
       -       198  PROJECT p_partkey, a8
       -       198  PROJECT p_partkey, a8
       -       198  INNER JOIN HASH ON PROJECTION_392.l_partkey = PROJECTION_389.p_partkey
       -       198  │└PROJECT p_partkey AS p_partkey_right
       -       198   PROJECT p_partkey
       -       198   TABLE SCAN part WHERE (p_brand = 'Brand#13') AND (p_container = 'MED CAN')
       -    200000  PROJECT p_partkey, a8
       -    200000  PROJECT 0.2 * a9 AS a8, l_partkey AS p_partkey
       -    200000  AGGREGATE AVG(l_quantity) AS a9 GROUP BY HASH l_partkey
       -   6001215  PROJECT l_quantity, l_partkey, l_partkey
       -   6001215  PROJECT l_quantity, l_partkey, l_partkey
       -   6001215  TABLE SCAN lineitem WHERE l_partkey = l_partkey
DuckDB
Estimate    Actual  Operator
       1         1  PROJECT avg_yearly
       -         1  AGGREGATE SUM(#0)
  170091       537  PROJECT l_extendedprice
  170091       537  FILTER CAST(l_quantity AS DOUBLE) < SUBQUERY
       0         0  RIGHT SEMI INNER JOIN HASH ON p_partkey IS NOT DISTINCT FROM p_partkey
  112547       198  │└AGGREGATE  GROUP BY HASH #3
       0      6029   RIGHT OUTER JOIN HASH ON p_partkey IS NOT DISTINCT FROM p_partkey
       -         0   │└SCAN EMPTY
 1770559       198   PROJECT l_avg, p_partkey
 1770559       198   AGGREGATE AVG(#1) GROUP BY HASH #0
 3541118      6029   PROJECT p_partkey, l_quantity
 3541118      6029   INNER JOIN HASH ON l_partkey = p_partkey
  112547         0   │└SCAN MATERIALISED
 6001215   5997579   TABLE SCAN lineitem
  170091      6029  INNER JOIN HASH ON l_partkey = p_partkey
    5406       198  │└TABLE SCAN part WHERE p_brand = 'Brand#13' AND p_container = 'MED CAN'
 6001215   5997579  TABLE SCAN lineitem
Databricks
Estimate    Actual  Operator
     957         0  SEQUENCE
       1         1  ├─AGGREGATE SUM(l_extendedprice)
       1         4   DISTRIBUTE GATHER
       1         4   AGGREGATE SUM(l_extendedprice)
   29800       537   INNER JOIN HASH ON p_partkey = l_partkey
     957       198   │└INNER JOIN HASH ON p_partkey = li.l_partkey
     957       198    │└DISTRIBUTE GATHER
 6000000       198     TABLE SCAN part WHERE (p_brand = 'Brand#13'collate UTF8_BINARY) AND (p_container = 'MED CAN'collate UTF8_BINARY)
       -    200000    FILTER l_avg IS NOT NULL
       -    200000    AGGREGATE AVG(li.l_quantity) GROUP BY HASH li.l_partkey
       -    399994    DISTRIBUTE HASH ON li.l_partkey
       -    399994    AGGREGATE AVG(li.l_quantity) GROUP BY HASH li.l_partkey
       -   6001215    TABLE SCAN lineitem
   29800     65413   DISTRIBUTE HASH ON l_partkey
  193000     65413   FILTER 
 6000000   6001215   TABLE SCAN lineitem
     957         0  └─FILTER 
  200000         1    DISTRIBUTE GATHER
  200000         1    AGGREGATE blockbloomfilteraggregate(bloomexpr(p_partkey,p_partkey),198L,2048L)
  200000       198    TABLE SCAN part WHERE (p_brand = 'Brand#13'collate UTF8_BINARY) AND (p_container = 'MED CAN'collate UTF8_BINARY)
PostgreSQL
Estimate    Actual  Operator
       1         1  SEQUENCE
    6029      6029  ├─AGGREGATE 0.2 * AVG(l_quantity)
  186899    189973   TABLE SEEK lineitem AS li
       1         1  └─AGGREGATE SUM(l_extendedprice) / 7.0
    2000       537    INNER JOIN HASH ON l_partkey = p_partkey AND (l_quantity < (SubPlan 1))
     249       198    │└TABLE SCAN part AS part WHERE (p_brand = 'Brand#13') AND (p_container = 'MED CAN')
 6001215   6001215    TABLE SCAN lineitem AS lineitem
SQL Server
Estimate    Actual  Operator
       1         1  SEQUENCE
   30545      6029  ├─MATERIALISE AS m5
   30545      6029   GROUP BY SORT l_partkey
   30545      6029   SORT l_partkey
   30545      6029   INNER JOIN HASH ON l_partkey = p_partkey
    1017       198   │└TABLE SCAN part WHERE p_container = 'MED CAN' AND p_brand = 'Brand#13'
   60012      6029   TABLE SCAN lineitem WHERE BLOOM(l_partkey)
       1         1  └─PROJECT Expr1010 / 7.0 AS Expr1011
       1         1    PROJECT CASE WHEN Expr1056 = 0 THEN NULL ELSE Expr1057 END AS Expr1010
       1         1    AGGREGATE COUNT(*) AS Expr1056, SUM(l_extendedprice) AS Expr1057
      30       537    INNER JOIN LOOP ON l_quantity < 0.2 * Expr1008
      30      6029    │└SCAN MATERIALISED m5
       1       198    PROJECT CASE WHEN Expr1054 = 0 THEN NULL ELSE Expr1055 / CONVERT_IMPLICIT(decimal(19,0),Expr1054,0) END AS Expr1008
       1       198    AGGREGATE COUNT(*) AS Expr1054, SUM(l_quantity) AS Expr1055
      30      6029    SCAN MATERIALISED m5

Commentary

The Correlated Subquery

Consider this expression:

l_quantity < (SELECT 0.2 * AVG(l_quantity)
                    FROM tpch.lineitem
                    WHERE l_partkey = p_partkey);

The crucial optimisation is for the optimiser to rewrite inoto this:

SELECT SUM(l_extendedprice) / 7.0 AS avg_yearly
FROM tpch.lineitem
INNER JOIN tpch.part
    ON l_partkey = p_partkey
INNER JOIN (SELECT l_partkey
                 , 0.2 * AVG(l_quantity) AS avg_quantity
            FROM tpch.lineitem
            GROUP BY l_partkey) AS decorrelated
    ON decorrelated.l_partkey = p_partkey
WHERE p_brand = 'Brand#13'
  AND p_container = 'MED CAN'
  AND l_quantity < decorrelated.avg_quantityh;

That allows much higher speed hash joins.

Agggressive pushdown

It is possible to push down the highly selective filters on p_brand and p_container into the subquery, further reducing the amount of data processed.