Re: query optimization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tom Lane wrote on 26.04.2012 21:17:
Richard Kojedzinszky<krichy@xxxxxxxxxxxx>  writes:
Dear list,
We have a database schema, which looks the same as the attached script.

When filling the tables with data, and skipping analyze on the table (so
pg_stats contains no records for table 'a'), the first select in the
script runs fast, but after an analyze the planner decides to sequence
scan tables b and c, thus making the query much slower. Can somebody help
me solving this issue, or tuning our installation to not to use sequence
scans in this case?

Um ... did you analyze all the tables, or just some of them?  I get
sub-millisecond runtimes if all four tables have been analyzed, but it
does seem to pick lousy plans if, say, only a and b have been analyzed.


Here it's similar to Richard's experience:

Before analyzing the four tables, the first statement yields this plan:

Merge Left Join  (cost=504.89..2634509.91 rows=125000000 width=16) (actual time=0.103..0.108 rows=1 loops=1)
  Merge Cond: (a.b = b.id)
  ->  Sort  (cost=504.89..506.14 rows=500 width=8) (actual time=0.043..0.043 rows=1 loops=1)
        Sort Key: a.b
        Sort Method: quicksort  Memory: 17kB
        ->  Bitmap Heap Scan on a  (cost=12.14..482.47 rows=500 width=8) (actual time=0.028..0.029 rows=1 loops=1)
              Recheck Cond: (id = 4)
              ->  Bitmap Index Scan on a_idx1  (cost=0.00..12.01 rows=500 width=0) (actual time=0.021..0.021 rows=1 loops=1)
                    Index Cond: (id = 4)
  ->  Materialize  (cost=0.00..884002.52 rows=50000000 width=8) (actual time=0.041..0.057 rows=5 loops=1)
        ->  Merge Join  (cost=0.00..759002.52 rows=50000000 width=8) (actual time=0.037..0.051 rows=5 loops=1)
              Merge Cond: (b.id = c.id)
              ->  Index Scan using b_idx1 on b  (cost=0.00..4376.26 rows=100000 width=4) (actual time=0.016..0.018 rows=5 loops=1)
              ->  Materialize  (cost=0.00..4626.26 rows=100000 width=4) (actual time=0.017..0.022 rows=5 loops=1)
                    ->  Index Scan using c_idx1 on c  (cost=0.00..4376.26 rows=100000 width=4) (actual time=0.014..0.017 rows=5 loops=1)
Total runtime: 0.209 ms

This continues to stay the plan for about 10-15 repetitions, then it turns to this plan

Hash Right Join  (cost=2701.29..6519.30 rows=1 width=16) (actual time=79.604..299.227 rows=1 loops=1)
  Hash Cond: (b.id = a.b)
  ->  Hash Join  (cost=2693.00..6136.00 rows=100000 width=8) (actual time=79.550..265.251 rows=100000 loops=1)
        Hash Cond: (b.id = c.id)
        ->  Seq Scan on b  (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.011..36.158 rows=100000 loops=1)
        ->  Hash  (cost=1443.00..1443.00 rows=100000 width=4) (actual time=79.461..79.461 rows=100000 loops=1)
              Buckets: 16384  Batches: 1  Memory Usage: 2735kB
              ->  Seq Scan on c  (cost=0.00..1443.00 rows=100000 width=4) (actual time=0.010..32.930 rows=100000 loops=1)
  ->  Hash  (cost=8.28..8.28 rows=1 width=8) (actual time=0.015..0.015 rows=1 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 1kB
        ->  Index Scan using a_idx1 on a  (cost=0.00..8.28 rows=1 width=8) (actual time=0.010..0.012 rows=1 loops=1)
              Index Cond: (id = 4)
Total runtime: 299.564 ms

(I guess autovacuum kicked in, because that the same plan I get when running analyze on all four tables right after populating them)

And the second one yields this one here (Regardless of analyze or not):

QUERY PLAN
Nested Loop Left Join  (cost=0.00..16.89 rows=1 width=16) (actual time=0.027..0.031 rows=1 loops=1)
  ->  Nested Loop Left Join  (cost=0.00..16.57 rows=1 width=12) (actual time=0.020..0.022 rows=1 loops=1)
        ->  Index Scan using a_idx1 on a  (cost=0.00..8.28 rows=1 width=8) (actual time=0.011..0.012 rows=1 loops=1)
              Index Cond: (id = 4)
        ->  Index Scan using b_idx1 on b  (cost=0.00..8.28 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=1)
              Index Cond: (a.b = id)
  ->  Index Scan using c_idx1 on c  (cost=0.00..0.31 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=1)
        Index Cond: (b.id = id)
Total runtime: 0.104 ms


My version:
PostgreSQL 9.1.3, compiled by Visual C++ build 1500, 32-bit
Running on Windows XP SP3

shared_buffers = 768MB
work_mem = 24MB	
effective_cache_size = 1024MB

All other (relevant) settings are on defaults

Regards
Thomas





--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux