Hi all, CÃdric Villemain-3 wrote: > > without explaining further why the antijoin has bad performance > without cluster, I wonder why you don't use this query : > > SELECT b.id, > max(m.id) > FROM box b, message m > WHERE m.box_id = b.id > GROUP BY b.id; > > looks similar and fastest. > I actually did use a similar strategy in the meantime (during my problem with the "left join" query we are talking about here all the time) for mitigation. It was SELECT MAX(e.id) FROM event_message e WHERE e.box_id = id and it performed worse in comparison to the "left join" query in the general case (i.e. before my problems began). At the end of this post is an explanation why I think I cannot use the solution you suggested above. Kevin Grittner wrote: > > Each connection can allocate work_mem, potentially several times. > On a machines without hundreds of GB of RAM, that pair of settings > could cause severe swapping. > Indeed, thanks for the warning. These settings are not for production but to exclude a performance degradation because of small cache sizes. Kevin Grittner wrote: > > I think you would need a left join to actually get identical > results: > > SELECT b.id, max(m.id) > FROM box b > LEFT JOIN message m ON m.box_id = b.id > GROUP BY b.id; > > But yeah, I would expect this approach to be much faster. Rather > easier to understand and harder to get wrong, too. > > Correct, it is much faster, even with unclustered ids. However, I think I cannot use it because of the way that query is generated (by hibernate). The (simplyfied) base query is just SELECT b.id from box the subquery (SELECT m1.id FROM message m1 LEFT JOIN message m2 ON (m1.box_id = m2.box_id AND m1.id < m2.id ) WHERE m2.id IS NULL AND m1.box_id = b.id) as lastMessageId is due to a hibernate formula (containing more or less plain SQL) to determine the last message id for that box. It ought to return just one row, not multiple. So I am constrained to the subquery in all optimization attemps (I cannot combine them as you did), at least I do not see how. If you have an idea for a more performant subquery though, let me know, as this can easily be replaced. Thanks for your help and suggestions panam -- View this message in context: http://postgresql.1045698.n5.nabble.com/Hash-Anti-Join-performance-degradation-tp4420974p4429125.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance