Search Postgresql Archives

PostgreSQL 8.3.3 chooses wrong query plan when LIMIT 1 added?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi everyone,

I'm experiencing a strange issue with PostgreSQL 8.3.3 whereby adding "LIMIT 1" to the query increases the query time from several 10s of ms to over 5s, and was wondering if anyone with more planner-fu can shed some light on this.

The database in question is being used to store information from SVN logs in a very simple format using a revision_files table given below:


svnlog=# \d revision_files
  Table "public.revision_files"
   Column    |  Type  | Modifiers
-------------+--------+-----------
 revision_id | bigint |
 file_id     | bigint |
Indexes:
    "revision_files_file_id_idx" btree (file_id)
    "revision_files_revision_id_idx" btree (revision_id)


What happens is that for each SVN revision, a row is inserted for each file currently within that revision so that obtaining a list of all the individual files within a given revision is nice and fast. As a result of this, the table grows quite quickly:


svnlog=# select count(*) from revision_files ;
  count
----------
 73886238
(1 row)


Now the problem comes with one particular query where I want to find the id of the last SVN revision that touched one particular file, which should be a reasonably instant query. However, what I am actually seeing is this:


svnlog=# SELECT revision_id FROM revision_files WHERE file_id=(SELECT file_id
FROM files WHERE filepath='/trunk/app/widgets/gimptoolbox-dnd.c' LIMIT 1)
ORDER BY revision_id DESC LIMIT 1;
 revision_id
-------------
       15011
(1 row)

Time: 5090.449 ms
svnlog=# explain analyze SELECT revision_id FROM revision_files WHERE file_id=(SELECT file_id
FROM files WHERE filepath='/trunk/app/widgets/gimptoolbox-dnd.c' LIMIT 1)
ORDER BY revision_id DESC LIMIT 1;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=16.32..354.70 rows=1 width=8) (actual time=5018.176..5018.177 rows=1 loops=1)
   InitPlan
-> Limit (cost=0.00..16.32 rows=1 width=8) (actual time=0.091..0.092 rows=1 loops=1) -> Index Scan using files_filepath_idx on files (cost=0.00..16.32 rows=1 width=8) (actual time=0.088..0.088 rows=1 loops=1) Index Cond: ((filepath)::text = '/trunk/app/widgets/gimptoolbox-dnd.c'::text) -> Index Scan Backward using revision_files_revision_id_idx on revision_files (cost=0.00..3756450.27 rows=11101 width=8) (actual time=5018.171..5018.171 rows=1 loops=1)
         Filter: (file_id = $0)
 Total runtime: 5018.237 ms
(8 rows)

Time: 5019.056 ms


Hmmmm so it seems to be favouring the backwards ordered index scan on revision_files_revision_id_idx rather than just pulling out the rows using revision_files_file_id_idx. This seems a strange choice given that the overall table size is quite large. The interesting part is that if I remove the LIMIT 1 I get this:


svnlog=# SELECT revision_id FROM revision_files WHERE file_id=(SELECT file_id
FROM files WHERE filepath='/trunk/app/widgets/gimptoolbox-dnd.c' LIMIT 1)
ORDER BY revision_id DESC;
 revision_id
-------------
       15011
       15010
	...
       14961
       14960
Time: 22.816 ms
svnlog=# explain analyze SELECT revision_id FROM revision_files WHERE file_id=(SELECT file_id
FROM files WHERE filepath='/trunk/app/widgets/gimptoolbox-dnd.c' LIMIT 1)
ORDER BY revision_id DESC;

  QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=57994.92..58022.67 rows=11101 width=8) (actual time=42.904..50.153 rows=5297 loops=1)
   Sort Key: revision_files.revision_id
   Sort Method:  quicksort  Memory: 441kB
   InitPlan
-> Limit (cost=0.00..16.32 rows=1 width=8) (actual time=0.076..0.078 rows=1 loops=1) -> Index Scan using files_filepath_idx on files (cost=0.00..16.32 rows=1 width=8) (actual time=0.074..0.074 rows=1 loops=1) Index Cond: ((filepath)::text = '/trunk/app/widgets/gimptoolbox-dnd.c'::text) -> Index Scan using revision_files_file_id_idx on revision_files (cost=0.00..57232.71 rows=11101 width=8) (actual time=0.130..22.528 rows=5297 loops=1)
         Index Cond: (file_id = $0)
 Total runtime: 57.310 ms
(10 rows)

Time: 58.246 ms


So the part I'm not sure is why the revision_files_revision_id_idx index is favoured over revision_files_file_id_idx when LIMIT 1 is present? AFAICT the planner should favour the index scan on file_id given that the estimated cost in the second query is much lower than the one estimated for the revision_id index on the first query.


Many thanks,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux