Hi Nick,
I believe a second ordering, by id desc, will get your query to use the right index, and shouldn't be functionally different from what you would expect.
```
select * from test_orders where
customer_id=any(ARRAY[9993,9997,9912,9954,9100,9101,9102,9234,9500,9512])
order by o_date desc, id desc limit 10;
```
I didn't look closely as to why from your data though. I'll leave it to more experienced people to comment as to why the planner misjudged your query badly. What happens when you raise the limit? Say to a 1000?
I believe a second ordering, by id desc, will get your query to use the right index, and shouldn't be functionally different from what you would expect.
```
select * from test_orders where
customer_id=any(ARRAY[9993,9997,9912,9954,9100,9101,9102,9234,9500,9512])
order by o_date desc, id desc limit 10;
```
I didn't look closely as to why from your data though. I'll leave it to more experienced people to comment as to why the planner misjudged your query badly. What happens when you raise the limit? Say to a 1000?
On Tue, May 19, 2020 at 3:00 PM Nick Cleaton <nick@xxxxxxxxxxx> wrote:
The attached script builds a 10G test table which demonstrates a
problem that we have in production with postgresql 12.3-1.pgdg18.04+1
on ubuntu linux. Indexes:
test_orders_o_date_idx btree(o_date)
test_orders_customer_id_o_date_idx btree(customer_id, o_date)
We query for the most recent orders for sets of customers, and
sometimes none of those customers have any orders and the results are
empty:
explain analyze select * from test_orders where
customer_id=any(ARRAY[9993,9997,9912,9954,9100,9101,9102,9234,9500,9512])
order by o_date desc;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=24848.96..24870.67 rows=8686 width=1839) (actual
time=1.101..1.102 rows=0 loops=1)
Sort Key: o_date DESC
Sort Method: quicksort Memory: 25kB
-> Index Scan using test_orders_customer_id_o_date_idx on
test_orders (cost=0.43..17361.20 rows=8686 width=1839) (actual
time=1.047..1.047 rows=0 loops=1)
Index Cond: (customer_id = ANY
('{9993,9997,9912,9954,9100,9101,9102,9234,9500,9512}'::integer[]))
Planning Time: 3.821 ms
Execution Time: 1.174 ms
(7 rows)
So far so good. But if we add a limit clause to the query then the
plan goes very wrong:
explain analyze select * from test_orders where
customer_id=any(ARRAY[9993,9997,9912,9954,9100,9101,9102,9234,9500,9512])
order by o_date desc limit 10;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..1660.98 rows=10 width=1839) (actual
time=4990.424..4990.424 rows=0 loops=1)
-> Index Scan Backward using test_orders_o_date_idx on test_orders
(cost=0.43..1442355.43 rows=8686 width=1839) (actual
time=4990.423..4990.423 rows=0 loops=1)
Filter: (customer_id = ANY
('{9993,9997,9912,9954,9100,9101,9102,9234,9500,9512}'::integer[]))
Rows Removed by Filter: 5000000
Planning Time: 0.063 ms
Execution Time: 4990.435 ms
Is there something we can adjust to get it to prefer
test_orders_customer_id_o_date_idx even when there's a limit clause ?