2011/5/26 panam <panam@xxxxxxx>: > Hi all, > > > Cédric Villemain-3 wrote: >> >> without explaining further why the antijoin has bad performance >> without cluster, I wonder why you don't use this query : >> >> SELECT b.id, >> max(m.id) >> FROM box b, message m >> WHERE m.box_id = b.id >> GROUP BY b.id; >> >> looks similar and fastest. >> > I actually did use a similar strategy in the meantime (during my problem > with the "left join" query we are talking about here all the time) for > mitigation. > It was > SELECT MAX(e.id) FROM event_message e WHERE e.box_id = id > and it performed worse in comparison to the "left join" query in the general > case (i.e. before my problems began). > At the end of this post is an explanation why I think I cannot use the > solution you suggested above. > > > Kevin Grittner wrote: >> >> Each connection can allocate work_mem, potentially several times. >> On a machines without hundreds of GB of RAM, that pair of settings >> could cause severe swapping. >> > Indeed, thanks for the warning. These settings are not for production but to > exclude a performance degradation because of small cache sizes. > > > Kevin Grittner wrote: >> >> I think you would need a left join to actually get identical >> results: >> >> SELECT b.id, max(m.id) >> FROM box b >> LEFT JOIN message m ON m.box_id = b.id >> GROUP BY b.id; >> >> But yeah, I would expect this approach to be much faster. Rather >> easier to understand and harder to get wrong, too. >> >> > Correct, it is much faster, even with unclustered ids. > However, I think I cannot use it because of the way that query is generated > (by hibernate). > The (simplyfied) base query is just > > SELECT b.id from box > > the subquery > > (SELECT m1.id FROM message m1 > LEFT JOIN message m2 > ON (m1.box_id = m2.box_id AND m1.id < m2.id ) > WHERE m2.id IS NULL AND m1.box_id = b.id) as lastMessageId > > is due to a hibernate formula (containing more or less plain SQL) to > determine the last message id for that box. It ought to return just one row, > not multiple. So I am constrained to the subquery in all optimization > attemps (I cannot combine them as you did), at least I do not see how. If > you have an idea for a more performant subquery though, let me know, as this > can easily be replaced. In production, if you have a decent IO system, you can lower random_page_cost and it may be faster using index (by default, with the use case you provided it choose a seqscan). It can be a bit tricky if you have to lower random_page_cost so much that it destroy others query plan but increase the perf of the current one. if it happens, post again :) (sometime need to change other cost parameters but it needs to be handle with care) I am not an hibernate expert, but I'll surprised if you can not drive hibernate to do what you want. > > Thanks for your help and suggestions > panam > > -- > View this message in context: http://postgresql.1045698.n5.nabble.com/Hash-Anti-Join-performance-degradation-tp4420974p4429125.html > Sent from the PostgreSQL - performance mailing list archive at Nabble.com. > > -- > Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ ; PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance