Search Postgresql Archives

Re: planer picks a bad plan (seq-scan instead of index) when adding an additional join

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



1. You MUST sequence scan dvds, as there is no way to do an index search on a like with % at the beginning. 2. You are asking for a left join on dvds, which means you want all records, so you must sequence scan dvds. The filters are all OR, so you can't say that a records is excluded until AFTER the join is completed.

true, but thats fast (200ms).

3. The join estimates that 39900 rows will come out, but only 2 do, thats out by a large factor 10^4, which means that it's not going to join movies after the filter is applied.


20866 is the total number of rows in the dvd table. the planer is now showing the accurate rate after another (auto)vacuum run. of course it can't know the estimate of a '%...' comparsion, so estimating the full result set is ok.

now, an additional table (containing 600k records) is added through a left join. all the sudden the query takes 24sec. although there are indices on both dvds.dvd_ean and data_soundmedia.sm_info_ean, the planer does not make use of the indices but rather chooses to do 2 seq-scans.
The 2 items from the first query are still true. You just have a lot more records to play with now by joining in a 3rd table. Which means there is no way to reduce the possible output set before the join.


well, under normal cases, the output set would be determined by index lookups (see bottom)


-> Sort (cost=286162.37..287781.38 rows=647601 width=660) (actual time=19336.011..20328.247 rows=646633 loops=1) This sort here is where nearly all of the 24 seconds goes. I am assuming at this point that the Merge Join is chosen because of the stats problem listed next;


exactly. but that sort shouldn't happen as there is an index on the join-field, and that is usualy pretty fast (~400ms), but not here...


>  Merge Left Join  (cost=309190.05..313899.09 rows=159086 width=1722)
(actual time=19876.552..21902.007 rows=20866 loops=1)
This estimate is out by a factor of 10. I'd increase the statistics on the join columns to see if it decides to use a different join method.

If the estimate for the Merge join is fixed to be closed, then it's likely an index scan would be chosen, 159000 is about 25% of the table. Assuming it's small on disk then it's faster to do a seq_scan than all the random I/O to read the index, then the heap to produce results.


the discs are not the fastest, but there is plenty of free ram available. seq_page_cost is in its default state (1.0). should i raise this?

How big is data_soundmedia? Mb size, not rows.
How much is your sort_mem/work_mem?

data_soundmedia is 195mb + 105mb for indices (6 fields indexed).
work_mem = 30MB, sort_mem is undefined in 8.2's postgresql.conf

what troubles me is that its only slow with this table (data_soundmedia). we have other tables (400k and 200k entries) that are joined with the same query in under 400ms total. these tables do have the exact same structure and indices defined, but in these joins the planer properly uses an Index Scann...

if i force enable_seqscan = off, the planer makes use of the index, resulting in acceptable query speed:

Nested Loop Left Join (cost=8402.16..257761.36 rows=83223 width=1067) (actual time=361.931..713.405 rows=2 loops=1) -> Hash Left Join (cost=8402.16..11292.37 rows=20873 width=407) (actual time=322.085..666.519 rows=2 loops=1)
       Hash Cond: (dvds.dvd_mov_id = movies.mov_id)
Filter: ((lower((movies.mov_name)::text) ~~ '%superman re%'::text) OR (lower((dvds.dvd_name)::text) ~~ '%superman re%'::text) OR (lower((dvds.dvd_edition)::text) ~~ '%superman re%'::text)) -> Seq Scan on dvds (cost=0.00..804.73 rows=20873 width=193) (actual time=11.781..329.672 rows=20866 loops=1) -> Hash (cost=8194.93..8194.93 rows=82893 width=214) (actual time=200.823..200.823 rows=37418 loops=1) -> Seq Scan on movies (cost=0.00..8194.93 rows=82893 width=214) (actual time=0.070..155.178 rows=37418 loops=1) -> Index Scan using data_soundmedia_info_ean_idx on data_soundmedia (cost=0.00..11.76 rows=4 width=660) (actual time=23.424..23.428 rows=1 loops=2) Index Cond: ((data_soundmedia.sm_info_ean)::text = (dvds.dvd_ean)::text)
Total runtime: 716.988 ms

2 rows fetched (821 ms)


could it be the index gets somehow corrupted? but on the other hand, if i do a TRUNCATE before loading new data, it should be rebuild anyway, shouldn't it?

thanks,
thomas



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux