Plan to support predicate push-down into subqueries with aggregates?

Adam Brusselback <adambrusselback@xxxxxxxxx> · Tue, 8 Mar 2016 16:38:56 -0500

Hi all.I was wondering if there were any plans to support predicate push-down optimization for subqueries (and views) with aggregates?

I was recently bit by this, as I had incorrectly assumed that this was an optimization that was in place, and designed quite a bit around that assumption, only to get hit with terrible performance when more data got loaded into the system.

Currently I had to solve the issue by having aggregate tables which store the data, which is maintained by triggers on IUD.  This gets messy quick, as I have some aggregates which are dependent on 5-6 other tables.  I'd love to be able to just store the logic for calculating aggregates in a view, and use that at query time instead of having to deal with it like this.

I have written a test case script that explains the type of queries I am talking about, and the issues.  It's heavily simplified compared to a real system, but it'll do.  Please see attached.

Thanks,
-Adam
CREATE SCHEMA test;

SET search_path = 'test';

CREATE TABLE header (
  header_id serial primary key,
  description text not null,
  amount numeric not null
  );

CREATE TABLE detail_1 (
  detail_1_id serial primary key,
  header_id integer not null references header (header_id),
  quantity numeric not null,
  rate numeric not null
  );

CREATE TABLE detail_2 (
  detail_2_id serial primary key,
  header_id integer not null references header (header_id),
  amount numeric not null
  );

  INSERT INTO header (description, amount)
  SELECT 'header record ' || generate_series, random() * 100
  FROM generate_series(1, 1000);

  INSERT INTO detail_1 (header_id, quantity, rate)
  SELECT header_id, random() * 50, random() * 10
  FROM header
  INNER JOIN generate_series(1, 800)
  ON random() < 0.5;

  INSERT INTO detail_2 (header_id, amount)
  SELECT header_id, random() * 120
  FROM header
  INNER JOIN generate_series(1, 7)
  ON random() < 0.5;

CREATE VIEW header_total AS 
SELECT header.header_id
, coalesce(detail_1.amount, 0) AS detail_1_amount
, coalesce(detail_1.detail_1_count, 0) AS detail_1_count
, coalesce(detail_2.amount, 0) AS detail_2_amount
, coalesce(detail_2.detail_2_count, 0) AS detail_2_count
, coalesce(detail_1.amount, 0) + coalesce(detail_2.amount, 0) as detail_total
, header.amount = coalesce(detail_1.amount, 0) + coalesce(detail_2.amount, 0) as balanced
FROM header
LEFT JOIN (
SELECT header_id
, sum(rate * quantity) as amount 
, count(detail_1_id) as detail_1_count
FROM detail_1
GROUP BY header_id
) detail_1
ON header.header_id = detail_1.header_id
LEFT JOIN (
SELECT header_id
, sum(amount) as amount 
, count(detail_2_id) as detail_2_count
FROM detail_2
GROUP BY header_id
) detail_2
ON header.header_id = detail_2.header_id;

CREATE INDEX idx_detail_1_header_id ON detail_1 (header_id);
CREATE INDEX idx_detail_2_header_id ON detail_2 (header_id);

ANALYZE header;
ANALYZE detail_1;
ANALYZE detail_2;

--Quick, is able to push down because it's a simple equality check
SELECT *
FROM header
INNER JOIN header_total
USING (header_id)
WHERE header.header_id = 26;

--Slow, no pushdown
SELECT *
FROM header
INNER JOIN header_total
USING (header_id)
WHERE header.header_id < 200;

--Slow, no pushdown
SELECT *
FROM header
INNER JOIN header_total
USING (header_id)
WHERE header.description like '%5%';

DROP SCHEMA test;
-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general