Dear All, I want to find all of the msg_ids from "messages" that are not in table "part_tsearch", where both tables are large but the result of the query is normally small, and often empty. To 'protect' the application against the unusual case where the result of the query is large I have added a limit clause: select msg_id from messages where msg_id not in (select msg_id from part_tsearch) limit 10; Currently there are about 30,000 rows in each table and about 500 rows in the result of the (un-limit-ed) query. So it is unusual in the sense that the size of the result is relatively large and the limit clause will take effect. Both tables are indexed by msg_id. This was taking longer than I expected, so I decided to explain-analyse it. However, when I run it under explain-analyse, it takes even longer than before: decimail=> select msg_id from messages where msg_id not in (select msg_id from part_tsearch) limit 10; msg_id -------- 67894 67809 52548 67745 67538 67540 67329 67246 67235 67140 (10 rows) (that takes about 2 seconds) decimail=> explain analyse select msg_id from messages where msg_id not in (select msg_id from part_tsearch) limit 10; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=4301.99..10534.34 rows=10 width=4) (actual time=6677.791..72417.068 rows=10 loops=1) -> Seq Scan on messages (cost=4301.99..11966058.86 rows=19193 width=4) (actual time=6677.725..72416.427 rows=10 loops=1) Filter: (NOT (subplan)) SubPlan -> Materialize (cost=4301.99..4830.07 rows=37908 width=4) (actual time=0.097..39.494 rows=862 loops=903) -> Seq Scan on part_tsearch (cost=0.00..4115.08 rows=37908 width=4) (actual time=0.104..1679.258 rows=37901 loops=1) Total runtime: 72424.066 ms (7 rows) As I increase the limit the runtime increases as follows: limit normal runtime explain-anlyse runtime 10 2 77 20 5 309 40 12 807 80 88 160 149 320 1016 I was not patient enough to wait for the remaining explain-analyse results, but I feel that there is a linear slowdown of about 60x between the raw query and the explain-analyse version. In general, for other queries, explain-analyse reports runtimes that agree with the runtimes of the actual queries. But the peculiar behaviour of explain-analyse is really a distraction from the fact that the query is slow, especially when the limit value is large. The system seems to be largely CPU-bound during these long run-times. The query plan reported by explain-analyse is the same in each case. How many times is it actually doing the seq-scan on part_tsearch? I see that the rows value reported for "Materialize" is rather different in the planned and actual numbers. What is this telling me? I analysed the tables immediately before starting on this. I was hoping that it would be implemented using some sort of index scan on the two tables, maybe something involing bitmaps. Is there something that I can do to the query, or to my indexes, to make this happen? I tried using "except" rather than "not in <subquery>"; this isn't exactly identical as "except" is required to return sorted results, which I don't need; when the limit clause is in effect I don't care which of the possible ids are returned and I never care about the order. In this case the runtimes are as follows: limit normal runtime explain-anlyse runtime 10 15 54 20 15 55 40 19 94 80 9 55 160 20 68 320 10 70 Note that again explain-analyse is slower than the normal runtime, but now by a factor of about 5 rather than the previous factor of about 60. Basically, the query runtimes are now essentially constant; I imagine that they would be flatter if the machine wasn't also running other processes. The query plan shows that it is doing a sequential pass over each of the tables, sorting and then doing a set-op - exactly as expected and OK for large limits, but not for small limits. I feel that it should be possible to do this in miliseconds, not seconds, using the existing indexes. This is with Postgresql 8.1 on Debian GNU/Linux. (BTW, this is for Decimail, my postgresql-based IMAP mail server. I have recently added tsearch2-based searching. It is still somewhat experimental but has never lost any mail. If you're interested, have a look at http://decimail.org/.) Many thanks for any advice, Phil. (You are welcome to CC: me in any replies)