Re: Query with order by and limit is very slow - wrong index used

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> a9-dev=> explain analyze select * from records where source_id ='http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml' order by id limit 200;
>                                                                    QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..489.57 rows=200 width=1126) (actual time=99701.773..99703.858 rows=200 loops=1)
>  ->  Index Scan using records_pkey on records  (cost=0.00..2441698.81 rows=997489 width=1126) (actual time=99684.878..99686.936 rows=200 loops=1)
>        Filter: ((source_id)::text = 'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml'::text)
>  Total runtime: 99705.916 ms
> (4 rows)
>
> a9-dev=> explain analyze select * from records2 where source_id ='http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml' order by id2 limit 200;
>                                                                    QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..770.01 rows=200 width=1124) (actual time=0.076..0.205 rows=200 loops=1)
>  ->  Index Scan using source_id2_id2_id2x on records2  (cost=0.00..3735751.15 rows=970308 width=1124) (actual time=0.074..0.180 rows=200 loops=1)
>        Index Cond: ((source_id)::text = 'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml'::text)
>  Total runtime: 0.235 ms
> (4 rows)
>
>
> First one uses records_pkey, and with estimated cost 2441698.81 runs in over 1,5 minute.
> Second uses index on (source_id, id) and with estimated cost 3735751.15 runs in 235 miliseconds.
>
> IMHO, problem lies not in records distribution nor normalization, but in planner's wrong cost estimation. I don't know to tell/force him to use proper index.

Getting information on your current configuration should help.
Please see http://wiki.postgresql.org/wiki/Slow_Query_Questions

You should take care of the cache effect of your queries between your
tests, here it is not a problem, but this explain  was way longer for
this similar query.

a9-dev=> explain analyze select * from records where source_id
='http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml' order by id
limit 200;
                                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.00..757.51 rows=200 width=1126) (actual
time=43.648..564.798 rows=200 loops=1)
 ->  Index Scan using source_id_id_idx on records
(cost=0.00..1590267.66 rows=419868 width=1126) (actual
time=43.631..564.700 rows=200 loops=1)
       Index Cond: ((source_id)::text =
'http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml'::text)
Total runtime: 564.895 ms



-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux