On 27 February 2016 at 11:07, James Parks <james.parks@xxxxxxxxxx> wrote: > > CREATE TABLE a (id bigint primary key, nonce bigint); > CREATE TABLE b (id bigint primary key, a_id bigint not null); > CREATE INDEX a_idx ON b (a_id); > > The query: > > SELECT b.* FROM b JOIN a ON b.a_id = a.id WHERE a.nonce = ? ORDER BY b.id > ASC; > > (skip down to [1] and [2] to see the query performance) > > What I know: > > If you force the query planner to use a merge join on the above query, it > takes 10+ minutes to complete using the data as per below. If you force the > query planner to use a hash join on the same data, it takes ~200 > milliseconds. I believe I know what is going on here, but can you please test; SELECT b.* FROM b WHERE EXISTS (SELECT 1 FROM a ON b.a_id = a.id AND a.nonce = ?) ORDER BY b.id ASC; using the merge join plan. If this performs much better then the problem is due to the merge join mark/restore causing the join to have to transition through many tuples which don't match the a.nonce = ? predicate. The mark and restore is not required for the rewritten query, as this use a semi join rather than a regular inner join. With the semi join the executor knows that it's only meant to be matching a single tuple in "a", so once the first match is found it can move to the next row in the outer relation without having to restore the scan back to where it started matching that inner row again. If I'm right, to get around the problem you could; create index on a (nonce, id); If such an index is out of the question then a patch has been submitted for review which should fix this problem in (hopefully) either 9.6 or 9.7 https://commitfest.postgresql.org/9/129/ If you have a test environment handy, it would be nice if you could test the patch on the current git head to see if this fixes your problem. The findings would be quite interesting for me. Please note this patch is for test environments only at this stage, not for production use. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance