Query with order by and limit is very slow - wrong index used

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Since it's my first on this list, I'd like to say "Hi guys" :)

Here is definition of my table:
a9-dev=> \d records;
                             Table "public.records"
                Column                |            Type             | Modifiers 
--------------------------------------+-----------------------------+-----------
 id                                   | bigint                      | not null
 checksum                             | character varying(32)       | not null
 data                                 | text                        | not null
 delete_date                          | timestamp without time zone | 
 last_processing_date                 | timestamp without time zone | 
 object_id                            | character varying(255)      | not null
 processing_path                      | character varying(255)      | not null
 schema_id                            | character varying(255)      | not null
 source_id                            | character varying(255)      | not null
 source_object_last_modification_date | timestamp without time zone | not null
Indexes:
    "records_pkey" PRIMARY KEY, btree (id)
    "unq_records_0" UNIQUE, btree (object_id, schema_id, source_id, processing_path)
    "length_processing_path_id_idx" btree (length(processing_path::text), id)
    "length_processing_path_idx" btree (length(processing_path::text))
    "object_id_id_idx" btree (object_id, id)
    "schema_id_id_idx" btree (schema_id, id)
    "schema_id_idx" btree (schema_id)
    "source_id_id_idx" btree (source_id, id)
    "source_id_idx" btree (source_id)
    "source_object_last_modification_date_id_idx" btree (source_object_last_modification_date, id)
    "source_object_last_modification_date_idx" btree (source_object_last_modification_date)

Average length of value of "data" column =  2991.7947061626100466

When I perform query such as this:  "select * from records where source_id = 'XXX' order by id limit 200;" I expect DB to use index source_id_id_idx  with XXX as filter. It is true for all but one values of XXX - when I ask for records with most common source_id, records_pkey index is used instead and performance is terrible! Explain analyze results below.

a9-dev=> explain analyze select * from records where source_id ='http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml' order by id limit 200;    
                                                                                                                                                                                                  QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..755.61 rows=200 width=1127) (actual time=75.292..684.582 rows=200 loops=1)
   ->  Index Scan using source_id_id_idx on records  (cost=0.00..1563542.89 rows=413849 width=1127) (actual time=75.289..684.495 rows=200 loops=1)
         Index Cond: ((source_id)::text = 'http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml'::text)
 Total runtime: 690.358 ms
(4 rows)

a9-dev=> explain analyze select * from records where source_id ='http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml' order by id limit 200;   
                                                                                                                                                                                                     QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..466.22 rows=200 width=1127) (actual time=124093.485..124095.540 rows=200 loops=1)
   ->  Index Scan using records_pkey on records  (cost=0.00..2333280.84 rows=1000937 width=1127) (actual time=124093.484..124095.501 rows=200 loops=1)
         Filter: ((source_id)::text = 'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml'::text)
 Total runtime: 124130.247 ms
(4 rows)


Some info about data distrubution:

a9-dev=> select min(id) from records;
  min   
--------
 190830
(1 row)

a9-dev=> select min(id), max(id) from records where source_id='http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml';
   min   |   max   
---------+---------
 1105217 | 3811326
(1 row)
a9-dev=> select min(id), max(id) from records where source_id='http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml';
   min   |   max   
---------+---------
 1544991 | 3811413
(1 row)

a9-dev=> select min(id), max(id) from (select id from records where source_id = 'http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml' order by id limit 200) as a;
  min   |   max   
---------+---------
1105217 | 1105416
(1 row)

a9-dev=> select min(id), max(id) from (select id from records where source_id = 'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml' order by id limit 200) as a;
  min   |   max   
---------+---------
1544991 | 1545190
(1 row)



a9-dev=> select source_id, count(*) from records where source_id = 'http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml' or source_id = 'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml' group by source_id;
                       source_id                        | count  
--------------------------------------------------------+--------
 http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml    | 427254
 http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml | 989184
(2 rows)

a9-dev=> select count(*) from records;
  count  
---------
 3620311
(1 row)


DB settings:

a9-dev=>  SELECT
a9-dev->    'version'::text AS "name",
a9-dev->    version() AS "current_setting"
a9-dev->  UNION ALL
a9-dev->  SELECT
a9-dev->    name,current_setting(name) 
a9-dev->  FROM pg_settings 
a9-dev->  WHERE NOT source='default' AND NOT name IN
a9-dev->    ('config_file','data_directory','hba_file','ident_file',
a9-dev(>    'log_timezone','DateStyle','lc_messages','lc_monetary',
a9-dev(>    'lc_numeric','lc_time','timezone_abbreviations',
a9-dev(>    'default_text_search_config','application_name',
a9-dev(>    'transaction_deferrable','transaction_isolation',
a9-dev(>    'transaction_read_only');
           name           |                                                 current_setting                                                 
--------------------------+-----------------------------------------------------------------------------------------------------------------
 version                  | PostgreSQL 9.0.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.6.0 20110530 (Red Hat 4.6.0-9), 64-bit
 lc_collate               | en_US.UTF-8
 lc_ctype                 | en_US.UTF-8
 listen_addresses         | *
 log_rotation_age         | 1d
 log_rotation_size        | 0
 log_truncate_on_rotation | on
 logging_collector        | on
 max_connections          | 100
 max_stack_depth          | 2MB
 port                     | 5432
 server_encoding          | UTF8
 shared_buffers           | 24MB
 TimeZone                 | Poland
(14 rows)


This query was always slow. Autovacuum is on, and I ran VACUUM ANALYZE manually few minutes before writing this email. 

Please help me with my problem. I'll be happy to provide any additional information if needed.
Michal Nowak


-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux