Hi, Attached is a query and its corresponding plan, where sorting of the CTE acts seems to be the bottle neck. It is a real execution plan captured with the auto_explain module. The query is recursive. In each iteration CTE acts is sorted again, which is obviously quite expensive for about 24000 rows and the same number of iterations. So I tried to put the ordering over the keys (d_id, activation_count) into the CTE definition itself. This is honoured, when evaluating the CTE but not for the iteration, where the CTE acts is still sorted again. I cannot see a reason for this. A simple CTE scan with filter condition should be enough. Removing the order by from the definition of the CTE has absolutely no impact on the performance, which is quite obvious regarding the number of iterations. Further it has no impact on the query plan at all. It only removes the sort node from the CTE acts node. Do I miss something which would make the plan incorrect or is the planner just not intelligent enough to recognize that a table is sorted by the desired keys? I hope the attachments prevent outlook from destroying any text formatting. Thanks in advance -- Regards, Robert
CTE Scan on he_per_dispenser (cost=75008.64..75720.96 rows=35616 width=16) (actual time=141.806..205579.272 rows=53152 loops=1) Output: he_per_dispenser.he_id, he_per_dispenser.activation_id CTE acts -> Sort (cost=8298.16..8431.04 rows=53152 width=24) (actual time=113.128..117.709 rows=53152 loops=1) Output: a.id, e.dispenser_id, a.activation_count, e."timestamp" Sort Key: e.dispenser_id, a.activation_count Sort Method: quicksort Memory: 5689kB -> Hash Join (cost=1531.92..4126.30 rows=53152 width=24) (actual time=25.653..66.205 rows=53152 loops=1) Output: a.id, e.dispenser_id, a.activation_count, e."timestamp" Hash Cond: (e.id = a.id) -> Seq Scan on events e (cost=0.00..1280.93 rows=78193 width=20) (actual time=0.006..8.615 rows=78193 loops=1) Output: e.id, e."timestamp", e.dispenser_id, e.relocation_id, e.synthetic -> Hash (cost=867.52..867.52 rows=53152 width=12) (actual time=18.737..18.737 rows=53152 loops=1) Output: a.id, a.activation_count -> Seq Scan on activations a (cost=0.00..867.52 rows=53152 width=12) (actual time=0.005..8.938 rows=53152 loops=1) Output: a.id, a.activation_count CTE he_per_dispenser -> Recursive Union (cost=1369.30..66577.60 rows=35616 width=40) (actual time=141.803..205530.157 rows=53152 loops=1) -> Hash Join (cost=1369.30..5359.69 rows=266 width=40) (actual time=141.803..155.424 rows=11 loops=1) Output: a.activation_id, acts.dispenser_id, (min(acts.activation_count)), a."timestamp", CASE WHEN (a."timestamp" <= (p."timestamp" + $1)) THEN p.last_hygiene_event ELSE a.activation_id END Hash Cond: ((a.dispenser_id = acts.dispenser_id) AND (a.activation_count = (min(acts.activation_count)))) -> CTE Scan on acts a (cost=0.00..1063.04 rows=53152 width=24) (actual time=113.130..118.725 rows=53152 loops=1) Output: a.activation_id, a.dispenser_id, a.activation_count, a."timestamp" -> Hash (cost=1366.30..1366.30 rows=200 width=24) (actual time=28.644..28.644 rows=11 loops=1) Output: acts.dispenser_id, (min(acts.activation_count)), p."timestamp", p.last_hygiene_event -> Hash Left Join (cost=1359.05..1366.30 rows=200 width=24) (actual time=28.634..28.639 rows=11 loops=1) Output: acts.dispenser_id, (min(acts.activation_count)), p."timestamp", p.last_hygiene_event Hash Cond: (acts.dispenser_id = p.dispenser_id) -> HashAggregate (cost=1328.80..1331.30 rows=200 width=8) (actual time=28.617..28.620 rows=11 loops=1) Output: acts.dispenser_id, min(acts.activation_count) -> CTE Scan on acts (cost=0.00..1063.04 rows=53152 width=8) (actual time=0.001..15.560 rows=53152 loops=1) Output: acts.activation_id, acts.dispenser_id, acts.activation_count, acts."timestamp" -> Hash (cost=19.00..19.00 rows=900 width=20) (actual time=0.002..0.002 rows=0 loops=1) Output: p."timestamp", p.last_hygiene_event, p.dispenser_id -> Seq Scan on processed_activations_cache p (cost=0.00..19.00 rows=900 width=20) (actual time=0.002..0.002 rows=0 loops=1) Output: p."timestamp", p.last_hygiene_event, p.dispenser_id -> Merge Join (cost=5439.41..6050.56 rows=3535 width=40) (actual time=3.172..8.274 rows=2 loops=24813) Output: a.activation_id, a.dispenser_id, a.activation_count, a."timestamp", CASE WHEN ((a."timestamp" - s."timestamp") <= $1) THEN s.he_id ELSE a.activation_id END Merge Cond: ((s.dispenser_id = a.dispenser_id) AND (((s.activation_count + 1)) = a.activation_count)) -> Sort (cost=204.52..211.17 rows=2660 width=24) (actual time=0.005..0.006 rows=2 loops=24813) Output: s."timestamp", s.he_id, s.dispenser_id, s.activation_count Sort Key: s.dispenser_id, ((s.activation_count + 1)) Sort Method: quicksort Memory: 25kB -> WorkTable Scan on he_per_dispenser s (cost=0.00..53.20 rows=2660 width=24) (actual time=0.001..0.001 rows=2 loops=24813) Output: s."timestamp", s.he_id, s.dispenser_id, s.activation_count -> Sort (cost=5234.90..5367.78 rows=53152 width=24) (actual time=0.001..2.540 rows=40748 loops=24813) Output: a.activation_id, a.dispenser_id, a.activation_count, a."timestamp" Sort Key: a.dispenser_id, a.activation_count Sort Method: quicksort Memory: 5689kB -> CTE Scan on acts a (cost=0.00..1063.04 rows=53152 width=24) (actual time=0.000..5.727 rows=53152 loops=1) Output: a.activation_id, a.dispenser_id, a.activation_count, a."timestamp"
-- $1 = INTERVAL'1.5s' in the execution plan INSERT INTO transformation.high_level_events( id, activation_id) WITH RECURSIVE acts AS ( /* This CTE should prevent duplicate physical tablescans. * An order by as the plan suggests, seems worthless. The * executed plan will will sort this CTE again in each iteration * of he_per_day. */ SELECT a.id AS activation_id, e.day_id, a.activation_count, e.timestamp FROM transformation.activations a JOIN transformation.events e USING(id) ORDER BY day_id, activatoin_count), he_per_day AS ( SELECT a.activation_id, min_acts.day_id, min_acts.activation_count, a.timestamp, CASE WHEN a.timestamp <= p.timestamp + $1 THEN p.last_high_level_event ELSE a.activation_id END AS he_id -- identifies a specific high level event FROM (SELECT day_id, MIN(activation_count) AS activation_count FROM acts GROUP BY day_id) min_acts JOIN acts a USING (day_id, activation_count) -- check whether it belongs to the last high_level event processed LEFT OUTER JOIN transformation.processed_activations_cache p USING (day_id) UNION ALL SELECT a.activation_id, a.day_id, a.activation_count, a.timestamp, CASE WHEN a.timestamp - s.timestamp <= $1 THEN s.he_id ELSE a.activation_id END AS he_id FROM acts a JOIN he_per_day s ON a.day_id = s.day_id AND a.activation_count = s.activation_count + 1) SELECT he_id AS id, activation_id FROM he_per_day;
-- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general