panam <panam@xxxxxxx> writes: > In my DB, there is a large table containing messages and one that contains > message boxes. > Messages are assigned to boxes via a child parent link m->b. > In order to obtain the last message for a specific box, I use the following > SQL: > SELECT m1.id FROM message m1 LEFT JOIN message m2 ON (m1.box_id = m2.box_id > AND m1.id < m2.id) WHERE m2.id IS NULL AND m1.box_id = id; BTW, this query doesn't actually match the EXPLAIN outputs... > So from my limited experience, the only significant difference I see is that > the Hash Anti Join takes a lot more time in plan 2, but I do not understand > why. Whatever's going on is below the level that EXPLAIN can show. I can think of a couple of possibilities: 1. The "extra" rows in the slower case all manage to come out to the same hash value, or some very small number of distinct hash values, such that we spend a lot of time searching a single hash chain. But it's hard to credit that adding 0.4% more rows could result in near 100x slowdown, no matter how bad their distribution. 2. There's some inefficiency in the use of the temp files, though again it's far from clear why your two cases would be noticeably different there. Possibly enabling log_temp_files would tell you something useful. One other thing I'm not following is how come it's using hash temp files at all, when you claim in your later message that you've got work_mem set to 1GB. It should certainly not take more than a couple meg to hold 20K rows at 16 payload bytes per row. You might want to check whether that setting actually took effect. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance