query planner placement of sort/limit w.r.t. joins

Dave Vitek <dvitek@xxxxxxxxxxxxxx> · Fri, 28 Apr 2017 15:59:17 -0400

Hi all,

I have a query I'd like to speed up.  I am wondering whether the query 
planner is capable of coming up with a certain kind of plan for this 
query, and if I might tickle it into doing so, or if I have to 
explicitly use subqueries to get what I want.

Imagine we have tables A, B, and C each having a one to one relationship.

SELECT A.val, B.val, C.val FROM A
   JOIN B ON A.b_id = B.id
   JOIN C ON B.c_id = C.id
   ORDER BY A.rank
   LIMIT 100;

Assume there are btree indices for all the "id" columns, but there is no 
index on "rank."  None of the fields are nullable.  id columns are 
unique.  Assume that there are about 1M tuples in the result set without 
the LIMIT.  ANALYZE has run recently.

In this case, I want postgres to do a sequential scan and a top-N sort 
against A since the ORDER BY only depends on columns of A, and then do 
100 index scans to implement the joins.

What I'm observing is that it is doing sequential scans on A, B, and C, 
joining, and then sorting.  The cost of the sequential scans of B and C 
is large enough to be painful.

Is what I want in the query planner's vocabulary?  It would need to 
exploit the fact that the _id columns are not nullable, and either 
exploit the uniqueness of the id columns or do an extra LIMIT step after 
the join.  I think I want it to effectively give the same result I 
expect (haven't tested) it would give for:

SELECT D.val, B.val, C.val FROM
   (SELECT * FROM A ORDER BY A.rank LIMIT 100) AS D
   JOIN B ON D.b_id = B.id
   JOIN C ON B.c_id = C.id
   LIMIT 100;

Perhaps there are reasons why this optimization is not safe that I 
haven't thought about?

The query is being generated for use by an ORM, so changing it to use a 
subquery for A and the ORDER BY+LIMIT is not so easy (plus, with 
different user input, it might need B or C in the ORDER BY).

- Dave

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general