Tilmann Singer skrev: > The query works fine for the common cases when matching rows are found > early in the sorted large table, like this: > > testdb=# EXPLAIN ANALYZE > SELECT * FROM large_table lt > LEFT JOIN relationships r ON lt.user_id=r.contact_id > WHERE r.user_id = 55555 OR lt.user_id = 55555 > ORDER BY lt.created_at DESC LIMIT 10; > QUERY PLAN > but for the following user_id there are 3M rows in the large table > which are more recent then the 10th matching one. The query then does > not perform so well: > > > testdb=# EXPLAIN ANALYZE > SELECT * FROM large_table lt > LEFT JOIN relationships r ON lt.user_id=r.contact_id > WHERE r.user_id = 12345 OR lt.user_id = 12345 > ORDER BY lt.created_at DESC LIMIT 10; > QUERY PLAN > When split it up into the two following queries it performs much > better for that user_id. Since the results of the two could be > combined into the desired result, I'm assuming it could also be done > efficiently within one query, if only a better plan would be used. > > > testdb=# EXPLAIN ANALYZE > SELECT * FROM large_table lt > WHERE lt.user_id = 12345 > ORDER BY lt.created_at DESC LIMIT 10; > QUERY PLAN > testdb=# EXPLAIN ANALYZE > SELECT * FROM large_table lt > WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) > ORDER BY created_at DESC LIMIT 10; > QUERY PLAN > I'm not very experienced reading query plans and don't know how to go > about this from here - is it theoretically possible to have a query > that performs well with the given data in both cases or is there a > conceptual problem? How does the "obvious" UNION query do - ie: SELECT * FROM ( SELECT * FROM large_table lt WHERE lt.user_id = 12345 UNION SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ) q ORDER BY created_at DESC LIMIT 10; ? How about SELECT * FROM large_table lt WHERE lt.user_id = 12345 OR user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ORDER BY created_at DESC LIMIT 10; ? I am missing a unique constraint on (user_id, contact_id) - otherwise the subselect is not equivalent to the join. Probably you also should have foreign key constraints on relationships.user_id and relationships.contact_id. These are unlikely to affect performance though, in my experience. It might be good to know whether contact_id = user_id is possible - since this would rule out the possibility of a row satisfying both branches of the union. Nis ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq