Hi all, @Tom, > BTW, this query doesn't actually match the EXPLAIN outputs... You're right, it is actually just the "heavy" subquery of a larger query which can be found here: http://pastebin.com/fuGrt0tB > One other thing I'm not following is how come it's using hash temp files > at all, when you claim in your later message that you've got work_mem > set to 1GB. It should certainly not take more than a couple meg to hold > 20K rows at 16 payload bytes per row. You might want to check whether > that setting actually took effect. As I said, I drastically increased the buffer sizes (at least I intended to) to see if it changed something. It first I thought it wouldn't. But yesterday (I think it was after a reboot), the "bad" queries suddenly were much faster (~ 20secs, still at least 3 times slower than the "good" queries though). Today, they were very slow again (I replayed the dumps in between). So I am not sure whether postgres actually picks up the altered configuration (even after reboot). Is there a way to determine the values actually used? @Craig > You can reproduce this behaviour consistently? It's _seriously_ weird, > and the sort of thing that when I encounter myself I tend to ask "what > else is going on that I'm missing?". Yes, I can reproduce it consistently using a dumpfile. > What happens if you DELETE more rows? Or fewer? What's the threshold? > What happens if you DELETE rows from the start not the end, or a random selection? >From some experiments I made earlier, I conclude that the rows added last are the problem. Deleting the first 10% did not seem to speed up the bad query. However, I haven't checked that systematically. > Does the problem persist if you DELETE the rows then CLUSTER the table > before running the query? Wow, I wasn't aware of cluster. Applying it (clustering the id PK) on the table causing the "bad" query worked wonders. It now needs just 4.3 secs as compared to 600 secs before (now with one day of data added as compared to the previous post) and 4.0 secs for the "good" query (also clustered) which is faster than the unclustered "good" query (about 6-8 secs). > Does the problem persist if you DELETE the rows then REINDEX? No, not noticeably. > The answer from the rest of your post would appear to be 'no, it's being > run in an otherwise-idle stand-alone test environment'. Right? Correct. It seems my issue is solved (at least for now). My impression is that it was just somehow "bad luck" that the rows originally and replayed from the dumps were kind of messed up in regard to their ids, especially - it seems - the newly added ones. This is somehow consistent with the peculiarities of the query which contains a pairwise id comparison which should greatly benefit an ordered set of ids. This also made me wonder how the internal plan is carried out. Is the engine able to leverage the fact that a part/range of the rows is totally or partially ordered on disk, e.g. using some kind of binary search or even "nearest neighbor"-search in that section (i.e. a special "micro-plan" or algorithm)? Or is the speed-up "just" because related data is usually "nearby" and most of the standard algorithms work best with clustered data? If the first is not the case, would that be a potential point for improvement? Maybe it would even be more efficient, if there were some sort of constraints that guarantee "ordered row" sections on the disk, i.e. preventing the addition of a row that had an index value in between two row values of an already ordered/clustered section. In the simplest case, it would start with the "first" row and end with the "last" row (on the time of doing the equivalent of "cluster"). So there would be a small list saying rows with id x - rows with id y are guaranteed to be ordered on disk (by id for example) now and for all times. So, would you like to further investigate my previous issue (I think it is still strange that performance suddenly dropped that dramatically)? Many thanks and regards, panam -- View this message in context: http://postgresql.1045698.n5.nabble.com/Hash-Anti-Join-performance-degradation-tp4420974p4425890.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance