On 03/10/17 04:29, Tom Lane wrote:
Mariel Cherkassky <mariel.cherkassky@xxxxxxxxx> writes:
explain analyze SELECT Ma.User_Id,
COUNT(*) COUNT
FROM Manuim Ma
WHERE Ma.Bb_Open_Date =
(SELECT Bb_Open_Date
FROM Manuim Man
WHERE Man.User_Id = Ma.User_Id order
by bb_open_date desc limit 1
)
GROUP BY Ma.User_Id
HAVING COUNT(*) > 1;
The core problem with this query is that the sub-select has to be done
over again for each row of the outer table, since it's a correlated
sub-select (ie, it refers to Ma.User_Id from the outer table). Replacing
a max() call with handmade logic doesn't do anything to help that.
I'd try refactoring it so that you calculate the max Bb_Open_Date just
once for each user id, perhaps along the lines of
SELECT Ma.User_Id,
COUNT(*) COUNT
FROM Manuim Ma,
(SELECT User_Id, max(Bb_Open_Date) as max
FROM Manuim Man
GROUP BY User_Id) ss
WHERE Ma.User_Id = ss.User_Id AND
Ma.Bb_Open_Date = ss.max
GROUP BY Ma.User_Id
HAVING COUNT(*) > 1;
This is still not going to be instantaneous, but it might be better.
It's possible that an index on (User_Id, Bb_Open_Date) would help,
but I'm not sure.
regards, tom lane
Further ideas based on Tom's rewrite: If that MAX is still expensive it
might be worth breaking
SELECT User_Id, max(Bb_Open_Date) as max
FROM Manuim Man
GROUP BY User_Id
out into a VIEW, and considering making it MATERIALIZED, or creating an
equivalent trigger based summary table (there are examples in the docs
of how to do this).
Cheers
Mark
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance