Re: Plan to support predicate push-down into subqueries with aggregates?

rob stone <floriparob@xxxxxxxxx> · Thu, 10 Mar 2016 13:48:55 +1100

On Wed, 2016-03-09 at 12:25 -0500, Adam Brusselback wrote:
> I responded yesterday, but it seems to have gotten caught up because
> it was too big with the attachments... Here it is again.
> 
> Sorry about not posting correctly, hopefully I did it right this
> time.
> 
> So I wanted to see if Sql Server (2014) could handle this type of
> query differently than Postgres (9.5.1), so I got an instance of
> express installed and ported the test script to it.
> 
> I updated my Postgres script so the data is the same in each server. 
> The end result is Sql Server seems to be able to optimize all of
> these queries MUCH better than Postgres.
> I disabled parallelism in Sql Server to make the comparison fair.
> 
> I've attached the explain analyze results for Postgres, and the
> execution plan for Sql Server (in picture form... don't know a better
> way)
> 
> Results are:
> --Sql Server:15ms average
> --Postgres: 6ms average
> SELECT *
> FROM header
> INNER JOIN header_total
> ON header.header_id = header_total.header_id
> WHERE header.header_id = 26;
> 
> 
> --Sql Server: 15ms average
> --Postgres: 1250ms average
> SELECT *
> FROM header
> INNER JOIN header_total
> ON header.header_id = header_total.header_id
> WHERE header.header_id < 27
> AND header.header_id > 24;
> 
> 
> --Sql Server: 567ms average
> --Postgres: 1265ms average
> SELECT *
> FROM header
> INNER JOIN header_total
> ON header.header_id = header_total.header_id
> WHERE header.description like '%5%';
> 
> 
> --Sql Server: 15ms average
> --Postgres: 1252ms average
> SELECT *
> FROM header_total
> WHERE header_total.header_id IN (
> SELECT header_id
> FROM header
> WHERE header.header_id < 27
> AND header.header_id > 24);
> 

Hello Adam,

I can't believe that something originating from the dark side can do
things better. So I went thru your test case. Running this on my trusty
Compaq Presario with 5Gb of RAM and using Postgres 9.5.1.

1) Created your tables but re-named the column "description" as
"header_description". header has 1,000 rows, detail_1 has 400,391 rows
and detail_2 has 3,451 rows.
2) altered your index definitions to make them unique by including
their primary keys. (I have a penchant for unique indexes).
3) Ran the ANALYZE.
4) Shut down the database so as to clear the cache.
5) A while later started the database and created this view:-

CREATE OR REPLACE VIEW header_total AS
SELECT header.header_id, header.header_description,
amount_1, detail_1_count,
amount_2, detail_2_count
FROM header
LEFT JOIN (
SELECT header_id, SUM(rate * quantity) AS amount_1, COUNT(detail_1_id)
AS detail_1_count
FROM detail_1
GROUP BY detail_1.header_id
) detail_1
ON header.header_id = detail_1.header_id
LEFT JOIN (
SELECT header_id, SUM(amount) AS amount_2, COUNT(detail_2_id) AS
detail_2_count
FROM detail_2
GROUP BY detail_2.header_id
) detail_2
ON header.header_id = detail_2.header_id

Note that:- (a) I included header_description in the definition.
    (b) Removed some lines as if you want the total you may as well
include it in your select from the view, and for the life of me I
couldn't understand the purpose of:-

header.amount = coalesce(detail_1.amount, 0) +
coalesce(detail_2.amount, 0) as balanced

Is "balanced" supposed to be a boolean?

If you need header.amount include it in the view.

6) Ran your three queries and here are the timings from the log:-

2016-03-10 13:07:47 AEDTLOG:  duration: 0.221 ms  parse <unnamed>:
SELECT * FROM header_total WHERE header_id = 26
2016-03-10 13:07:47 AEDTLOG:  duration: 0.551 ms  bind <unnamed>:
SELECT * FROM header_total WHERE header_id = 26
2016-03-10 13:07:47 AEDTLOG:  duration: 1.103 ms  execute <unnamed>:
SELECT * FROM header_total WHERE header_id = 26
2016-03-10 13:07:54 AEDTLOG:  duration: 0.180 ms  parse <unnamed>:
SELECT * FROM header_total WHERE header_id < 200
2016-03-10 13:07:54 AEDTLOG:  duration: 0.481 ms  bind <unnamed>:
SELECT * FROM header_total WHERE header_id < 200
2016-03-10 13:07:55 AEDTLOG:  duration: 458.418 ms  execute <unnamed>:
SELECT * FROM header_total WHERE header_id < 200
2016-03-10 13:08:01 AEDTLOG:  duration: 0.230 ms  parse <unnamed>:
SELECT * FROM header_total WHERE header_description like '%5%'
2016-03-10 13:08:01 AEDTLOG:  duration: 0.542 ms  bind <unnamed>:
SELECT * FROM header_total WHERE header_description like '%5%'
2016-03-10 13:08:01 AEDTLOG:  duration: 459.346 ms  execute <unnamed>:
SELECT * FROM header_total WHERE header_description like '%5%'

I don't believe that (0.230 ms + 0.542 ms + 459.346 ms) could be
described as "slow" when it returns 271 rows.

Obviously it would help if there were more details about your
application. Also creating the test data via those bulk inserts doesn't
replicate any randomness that may occur via inserts made by an
application.

BTW, I'm a great fan of using views. 

HTH,
Rob

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general