Hi all,
I have a query I'd like to speed up. I am wondering whether the query
planner is capable of coming up with a certain kind of plan for this
query, and if I might tickle it into doing so, or if I have to
explicitly use subqueries to get what I want.
Imagine we have tables A, B, and C each having a one to one relationship.
SELECT A.val, B.val, C.val FROM A
JOIN B ON A.b_id = B.id
JOIN C ON B.c_id = C.id
ORDER BY A.rank
LIMIT 100;
Assume there are btree indices for all the "id" columns, but there is no
index on "rank." None of the fields are nullable. id columns are
unique. Assume that there are about 1M tuples in the result set without
the LIMIT. ANALYZE has run recently.
In this case, I want postgres to do a sequential scan and a top-N sort
against A since the ORDER BY only depends on columns of A, and then do
100 index scans to implement the joins.
What I'm observing is that it is doing sequential scans on A, B, and C,
joining, and then sorting. The cost of the sequential scans of B and C
is large enough to be painful.
Is what I want in the query planner's vocabulary? It would need to
exploit the fact that the _id columns are not nullable, and either
exploit the uniqueness of the id columns or do an extra LIMIT step after
the join. I think I want it to effectively give the same result I
expect (haven't tested) it would give for:
SELECT D.val, B.val, C.val FROM
(SELECT * FROM A ORDER BY A.rank LIMIT 100) AS D
JOIN B ON D.b_id = B.id
JOIN C ON B.c_id = C.id
LIMIT 100;
Perhaps there are reasons why this optimization is not safe that I
haven't thought about?
The query is being generated for use by an ORM, so changing it to use a
subquery for A and the ORDER BY+LIMIT is not so easy (plus, with
different user input, it might need B or C in the ORDER BY).
- Dave
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general