On Mon, Dec 19, 2016 at 12:05 PM, Christophe Escobar <christophe.esco@xxxxxxxxx> wrote:
Hi,
I am using PostgreSQL 9.4.8 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit.I have a notification table with about ~45 000 000 rows.I have some performance issues when trying to fetch rows from the table with a specific query, I suspect the planner to choose the wrong index because of the limit.
The query look like this: SELECT * FROM notifications WHERE bucket_id IN (30231,30230,30104) AND type IN ('foo', 'bar') ORDER BY created_at DESC LIMIT 20;
It returns 0 rows for this example.The table looks like this with the indices:Table "notifications"Column | Type | Modifiers----------------------------+-----------------------------+- ------------------------------ ----------------------------- id | integer | not null default nextval('notifications_id_seq'::regclass) account_id | integer |version_id | integer |item_id | integer |type | character varying(255) |created_at | timestamp without time zone |updated_at | timestamp without time zone |meta_data | text |bucket_id | integer |Indexes:"notifications_pkey" PRIMARY KEY, btree (id)"index_notifications_on_account_id" btree (account_id) "index_notifications_on_bucket_id" btree (bucket_id) "index_notifications_on_item_id" btree (item_id) "index_notifications_on_created_at_and_bucket_id" btree (created_at, bucket_id) "index_notifications_on_type_and_bucket_id" btree (type, bucket_id) "index_notifications_on_version_id" btree (version_id)
Before testing, I have run a VACUUM ANALYZE, and here are the statistics on the indices:SELECT relname, relkind, reltuples, relpagesFROM pg_classWHERE relname LIKE '%notifications%';relname | relkind | reltuples | relpages---------------------------------------------------+-------- -+-------------+---------- notifications_pkey | i | 4.55221e+07 | 124820index_notifications_on_account_id | i | 4.55221e+07 | 124820 index_notifications_on_bucket_id | i | 4.55221e+07 | 124819 index_notifications_on_item_id | i | 4.55221e+07 | 124821 index_notifications_on_version_id | i | 4.55221e+07 | 124821 index_notifications_on_created_at_and_bucket_id | i | 4.55221e+07 | 175281 index_notifications_on_type_and_bucket_id | i | 4.55221e+07 | 188281 notifications_id_seq | S | 1 | 1notifications | r | 4.55225e+07 | 566412
I tried three different EXPLAIN ANALYZE, on a subset of my table (with the entire table, I have yet to see what is the total duration of the query when using LIMIT 20, but it takes more than 5 minutes which is not acceptable for my use case).** Without limit **
EXPLAIN ANALYZE SELECT * FROM notifications WHERE bucket_id IN (30231,30230,30104) AND type IN ('foo', 'bar') ORDER BY created_at DESC;------------------------------------------------------------ ------------------------------ ------------------------------ ------------------------------ ------ Sort (cost=7474.92..7480.08 rows=2061 width=187) (actual time=0.149..0.149 rows=0 loops=1)Sort Key: created_atSort Method: quicksort Memory: 25kB-> Bitmap Heap Scan on notifications (cost=71.68..7361.47 rows=2061 width=187) (actual time=0.136..0.136 rows=0 loops=1)Recheck Cond: (((type)::text = ANY ('{foo,bar}'::text[])) AND (bucket_id = ANY ('{30231,30230,30104}'::integer[]))) -> Bitmap Index Scan on index_notifications_on_type_and_bucket_id (cost=0.00..71.16 rows=2061 width=0) (actual time=0.135..0.135 rows=0 loops=1) Index Cond: (((type)::text = ANY ('{foo,bar}'::text[])) AND (bucket_id = ANY ('{30231,30230,30104}'::integer[]))) Planning time: 0.715 msExecution time: 0.198 ms
** With LIMIT 20 **EXPLAIN ANALYZE SELECT * FROM notifications WHERE bucket_id IN (30231,30230,30104) AND type IN ('foo', 'bar') ORDER BY created_at DESC limit 20;------------------------------------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------ Limit (cost=0.43..3341.84 rows=20 width=187) (actual time=60133.701..60133.701 rows=0 loops=1)-> Index Scan Backward using index_notifications_on_created_at_and_bucket_id on notifications (cost=0.43..344332.66 rows=2061 width=187) (actual time=60133.695..60133.695 rows=0 loops=1) Filter: (((type)::text = ANY ('{foo,bar}'::text[])) AND (bucket_id = ANY ('{30231,30230,30104}'::integer[]))) Rows Removed by Filter: 3441510Planning time: 1.034 msExecution time: 60133.740 msThanks in advance for any help,
** With limit 50 **
EXPLAIN ANALYZE SELECT * FROM notifications WHERE bucket_id IN (30231,30230,30104) AND type IN ('foo', 'bar') ORDER BY created_at DESC limit 50;------------------------------------------------------------ ------------------------------ ------------------------------ ------------------------------ ------------ Limit (cost=7429.94..7430.06 rows=50 width=187) (actual time=0.111..0.111 rows=0 loops=1)-> Sort (cost=7429.94..7435.09 rows=2061 width=187) (actual time=0.110..0.110 rows=0 loops=1)Sort Key: created_atSort Method: quicksort Memory: 25kB-> Bitmap Heap Scan on notifications (cost=71.68..7361.47 rows=2061 width=187) (actual time=0.107..0.107 rows=0 loops=1)Recheck Cond: (((type)::text = ANY ('{foo,bar}'::text[])) AND (bucket_id = ANY ('{30231,30230,30104}'::integer[]))) -> Bitmap Index Scan on index_notifications_on_type_and_bucket_id (cost=0.00..71.16 rows=2061 width=0) (actual time=0.105..0.105 rows=0 loops=1) Index Cond: (((type)::text = ANY ('{foo,bar}'::text[])) AND (bucket_id = ANY ('{30231,30230,30104}'::integer[]))) Planning time: 0.151 msExecution time: 0.139 ms
As you can see, when I have the LIMIT 20, the execution time takes around 1 minutes (on a very small subset of the entire table).
Actually I have tried different LIMIT, and when the LIMIT is <= 45, it will use the index scan backward.
Removing the index 'index_notifications_on_created_at_and_bucket_id' may prevent the planner from choosing the index scan backward for this query, but this index is used for other querying on that table...
1) Why is the planner changing index scanning at the threshold of 45 for the LIMIT ? Why not 50 ? 100 ? I may take the solution in my application to have a LIMIT > 45 in order to prevent the performance issue, but am I sure that this threshold will always be the same ?
2) Is it possible for a specific query to force the planner on choosing a given index or preventing it from choosing one ?
What kind of other options do I have to solve this performance issue ?
Regards,--Christophe Escobar
You can temporarily disable index scanning for a session with
SET enable_indexscan = off;SET enable_indexonlyscan = off;
--
Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.