On Wed, May 24, 2017 at 1:42 PM, Sam Saffron <sam.saffron@xxxxxxxxx> wrote:
I have this query that is not picking the right index unless I hard code dates:
SELECT "topics".* FROM "topics"
WHERE topics.last_unread_at >= '2017-05-11 20:56:24'
"Index Scan using index_topics_on_last_unread_at on topics
(cost=0.41..8.43 rows=1 width=725) (actual time=0.005..0.065 rows=5
loops=1)"
" Index Cond: (last_unread_at >= '2017-05-11 20:56:24'::timestamp
without time zone)"
"Planning time: 0.136 ms"
"Execution time: 0.087 ms"
PostgreSQL knows that few entries come after 2017-05-11 (it thinks 1, actually 5) and comes up with a plan which works well for that situation.
SELECT "topics".* FROM "topics"
WHERE topics.last_unread_at >= (select first_topic_unread_at from
user_stats us where us.user_id = 1)
"Seq Scan on topics (cost=8.30..24030.12 rows=20555 width=725) (actual
time=3.186..59.636 rows=5 loops=1)"
" Filter: (last_unread_at >= $0)"
" Rows Removed by Filter: 61660"
" InitPlan 1 (returns $0)"
" -> Index Scan using user_stats_pkey on user_stats us
(cost=0.29..8.30 rows=1 width=8) (actual time=0.008..0.009 rows=1
loops=1)"
" Index Cond: (user_id = 1)"
"Planning time: 0.147 ms"
"Execution time: 59.671 ms"
At the time PostgreSQL plans this query, it doesn't know what the answer to the subquery is going to be. Not having the true answer at its fingertips, it guesses that one third of the table is going to fall after the results of that subquery.
Maybe it should first execute the subquery and then re-plan the rest of the query based on the results. But there is no provision for it to do that, and no concrete plans (that I know of) to implement such a thing.
The results here simply do not make sense to me, should I be piping
dates in here to avoid this issue and running 2 queries instead of 1?
That is the most pragmatic approach. It isn't very nice, but the alternatives are worse.
Cheers,
Jeff