Hi all.
I was wondering if there were any plans to support predicate push-down optimization for subqueries (and views) with aggregates?
I was recently bit by this, as I had incorrectly assumed that this was an optimization that was in place, and designed quite a bit around that assumption, only to get hit with terrible performance when more data got loaded into the system.
Currently I had to solve the issue by having aggregate tables which store the data, which is maintained by triggers on IUD. This gets messy quick, as I have some aggregates which are dependent on 5-6 other tables. I'd love to be able to just store the logic for calculating aggregates in a view, and use that at query time instead of having to deal with it like this.
I have written a test case script that explains the type of queries I am talking about, and the issues. It's heavily simplified compared to a real system, but it'll do. Please see attached.
Thanks,
-Adam
CREATE SCHEMA test; SET search_path = 'test'; CREATE TABLE header ( header_id serial primary key, description text not null, amount numeric not null ); CREATE TABLE detail_1 ( detail_1_id serial primary key, header_id integer not null references header (header_id), quantity numeric not null, rate numeric not null ); CREATE TABLE detail_2 ( detail_2_id serial primary key, header_id integer not null references header (header_id), amount numeric not null ); INSERT INTO header (description, amount) SELECT 'header record ' || generate_series, random() * 100 FROM generate_series(1, 1000); INSERT INTO detail_1 (header_id, quantity, rate) SELECT header_id, random() * 50, random() * 10 FROM header INNER JOIN generate_series(1, 800) ON random() < 0.5; INSERT INTO detail_2 (header_id, amount) SELECT header_id, random() * 120 FROM header INNER JOIN generate_series(1, 7) ON random() < 0.5; CREATE VIEW header_total AS SELECT header.header_id , coalesce(detail_1.amount, 0) AS detail_1_amount , coalesce(detail_1.detail_1_count, 0) AS detail_1_count , coalesce(detail_2.amount, 0) AS detail_2_amount , coalesce(detail_2.detail_2_count, 0) AS detail_2_count , coalesce(detail_1.amount, 0) + coalesce(detail_2.amount, 0) as detail_total , header.amount = coalesce(detail_1.amount, 0) + coalesce(detail_2.amount, 0) as balanced FROM header LEFT JOIN ( SELECT header_id , sum(rate * quantity) as amount , count(detail_1_id) as detail_1_count FROM detail_1 GROUP BY header_id ) detail_1 ON header.header_id = detail_1.header_id LEFT JOIN ( SELECT header_id , sum(amount) as amount , count(detail_2_id) as detail_2_count FROM detail_2 GROUP BY header_id ) detail_2 ON header.header_id = detail_2.header_id; CREATE INDEX idx_detail_1_header_id ON detail_1 (header_id); CREATE INDEX idx_detail_2_header_id ON detail_2 (header_id); ANALYZE header; ANALYZE detail_1; ANALYZE detail_2; --Quick, is able to push down because it's a simple equality check SELECT * FROM header INNER JOIN header_total USING (header_id) WHERE header.header_id = 26; --Slow, no pushdown SELECT * FROM header INNER JOIN header_total USING (header_id) WHERE header.header_id < 200; --Slow, no pushdown SELECT * FROM header INNER JOIN header_total USING (header_id) WHERE header.description like '%5%'; DROP SCHEMA test;
-- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general