The attached script builds a 10G test table which demonstrates a problem that we have in production with postgresql 12.3-1.pgdg18.04+1 on ubuntu linux. Indexes: test_orders_o_date_idx btree(o_date) test_orders_customer_id_o_date_idx btree(customer_id, o_date) We query for the most recent orders for sets of customers, and sometimes none of those customers have any orders and the results are empty: explain analyze select * from test_orders where customer_id=any(ARRAY[9993,9997,9912,9954,9100,9101,9102,9234,9500,9512]) order by o_date desc; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=24848.96..24870.67 rows=8686 width=1839) (actual time=1.101..1.102 rows=0 loops=1) Sort Key: o_date DESC Sort Method: quicksort Memory: 25kB -> Index Scan using test_orders_customer_id_o_date_idx on test_orders (cost=0.43..17361.20 rows=8686 width=1839) (actual time=1.047..1.047 rows=0 loops=1) Index Cond: (customer_id = ANY ('{9993,9997,9912,9954,9100,9101,9102,9234,9500,9512}'::integer[])) Planning Time: 3.821 ms Execution Time: 1.174 ms (7 rows) So far so good. But if we add a limit clause to the query then the plan goes very wrong: explain analyze select * from test_orders where customer_id=any(ARRAY[9993,9997,9912,9954,9100,9101,9102,9234,9500,9512]) order by o_date desc limit 10; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.43..1660.98 rows=10 width=1839) (actual time=4990.424..4990.424 rows=0 loops=1) -> Index Scan Backward using test_orders_o_date_idx on test_orders (cost=0.43..1442355.43 rows=8686 width=1839) (actual time=4990.423..4990.423 rows=0 loops=1) Filter: (customer_id = ANY ('{9993,9997,9912,9954,9100,9101,9102,9234,9500,9512}'::integer[])) Rows Removed by Filter: 5000000 Planning Time: 0.063 ms Execution Time: 4990.435 ms Is there something we can adjust to get it to prefer test_orders_customer_id_o_date_idx even when there's a limit clause ?
#!/usr/bin/python3 import random import datetime secs_in_day = 24*60*60 longstr = """iufdpoaiusoto3u5034534i5j345k345lku09s80s9dfjwer.,newrwwerwerwerlwerjlwejrlkewjr""" * 10 print(""" drop table if exists test_orders; drop sequence if exists test_orders_id_seq; CREATE SEQUENCE test_orders_id_seq START WITH 1 INCREMENT BY 1 NO MINVALUE NO MAXVALUE CACHE 1; CREATE TABLE test_orders ( id integer DEFAULT nextval('test_orders_id_seq'::regclass) NOT NULL, o_date timestamp with time zone NOT NULL, customer_id integer, str1 text, num1 integer, long1 text, long2 text, long3 text, long4 text ); COPY test_orders(o_date, customer_id, str1, num1, long1, long2, long3, long4) FROM stdin;""") for day in range(5000): orders = [(secs_in_day * day + random.randrange(secs_in_day), customer) for customer in range(day, day+1000)] for o_date, customer_id in sorted(orders): print(datetime.datetime.fromtimestamp(1234234234 + o_date).isoformat(), customer_id, "blah", random.randrange(1000000), longstr, longstr, longstr, longstr, sep="\t") print("""\\. create index test_orders_o_date_idx on test_orders using btree(o_date); create index test_orders_customer_id_o_date_idx on test_orders using btree(customer_id, o_date); analyze test_orders; """)