Thanks Tom, that sounds like what is happening. Some additional comments/questions inline. Tom Lane-2 wrote > I think what must be happening is that the planner notes the maximum > possible value of v.id and supposes that the mergejoin will stop far short > of completion because v.id spans just a small part of the range of > usagestats.tid. Which it does, when you have only the nonselective filter > condition on usagestats. However, the executor cannot stop until it's > fetched a usagestats row that has a tid value larger than the last v.id > value; otherwise it can't be sure it's emitted all the required join rows. Ok, that makes a lot of sense. It is scanning the tid index though, so once it gets past the last value in v.id isn't it guaranteed that there can be no more required join rows? Even if it sees tid = 5000 and type = 'aircraft' then it can know there are no more tids less than 5000. It must be that it waits to do this check until it gets a row that matches the filter, maybe this is an optimization in most cases? Seems like the cost of the check would be small enough compared to the cost of looking up the next row to do it every time. Tom Lane-2 wrote > I'm guessing that the "type = 'vehicle'" condition eliminates all such > rows, or at least enough of them that a very large part of the usagestats > table has to be scanned to find the first can't-possibly-match row. Yes, you are exactly right. Tom Lane-2 wrote > I'm not sure there's anything much we can do to improve this situation > in Postgres. It seems like a sufficiently bizarre corner case that it > wouldn't be appropriate to spend planner cycles checking for it, and > I'm not sure how we'd check for it even if we were willing to spend those > cycles. You might consider altering the query, or inserting some kind of > dummy sentinel row in the data, or changing the schema (is it really > sensible to keep vehicle usagestats in the same table as other > usagestats?). A brute-force fix would be "enable_mergejoin = off", but > that would prevent selecting this plan type even when it actually is > a significant win. I agree it may make sense to change the schema, although there are some good reasons to have it this way (I obfuscated the table names). If we partitioned the table on "type" then would the planner be able to stop after finishing the 'vehicle' type partition? Tom Lane-2 wrote > Actually, an easy fix might be to create a 2-column index on > usagestats(type, tid). I think the planner should be able to > use that to produce sorted output for the mergejoin, and you'd > get the best of both worlds, because the indexscan will stop > immediately when it's exhausted the rows with type = 'vehicle'. Actually there is an index on (type, tid) and it doesn't help. I just tried adding an index on (tid, type) and it partially fixed the issue, judging by the page hits, it looks like it is still scanning all the rows of the compound index, but no longer needs to go to the actual table. This takes 600ms instead of the 11,500ms of the original query, but still much more than the 60ms when you change the type='vehicle' condition to freq > -1. So it isn't a perfect solution. We could also switch to enum values for the type field which may reduce the (tid, type) index size enough to make the performance adequate, but it would be best if we can just get it to quit the scan early, so the performance doesn't degrade if the table grows significantly. Best, Jake -- View this message in context: http://postgresql.nabble.com/Merge-Join-chooses-very-slow-index-scan-tp5842523p5842603.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance