Re: query plan not optimal

Marc Cousin <cousinmarc@xxxxxxxxx> · Thu, 19 Dec 2013 20:00:16 +0100

On 19/12/2013 19:33, Jeff Janes wrote:
>                                                                 QUERY PLAN
>     ----------------------------------------------------------------------------------------------------------------------------------
>      Nested Loop  (cost=0.56..4001768.10 rows=479020 width=26) (actual
>     time=2.303..15371.237 rows=479020 loops=1)
>        Output: path.pathid, batch.filename
>        Buffers: shared hit=2403958 read=7539
>        ->  Seq Scan on public.batch  (cost=0.00..11727.20 rows=479020
>     width=85) (actual time=0.340..160.142 rows=479020 loops=1)
>              Output: batch.path, batch.filename
>              Buffers: shared read=6937
>        ->  Index Scan using idx_path on public.path  (cost=0.56..8.32
>     rows=1 width=16) (actual time=0.030..0.031 rows=1 loops=479020)
>              Output: path.pathid, path.path
>              Index Cond: (path.path = batch.path)
>              Buffers: shared hit=2403958 read=602
>      Total runtime: 15439.043 ms
> 
> 
>     As you can see, more than twice as fast, and a very high hit ratio
>     on the path table, even if we start from a cold cache (I did, here,
>     both PostgreSQL and OS). We have an excellent hit ratio because the
>     batch table contains few different path (several files in a
>     directory), and is already quite clustered, as it comes from a
>     backup, which is of course performed directory by directory.
> 
> 
> What is your effective_cache_size set to?
> 
> Cheers,
> 
> Jeff
Yeah, I had forgotten to set it up correctly on this test environment
(its value is correctly set in production environments). Putting it to a
few gigabytes here gives me this cost:

bacula=# explain select pathid, filename from batch join path using (path);
                                 QUERY PLAN
----------------------------------------------------------------------------
 Nested Loop  (cost=0.56..2083904.10 rows=479020 width=26)
   ->  Seq Scan on batch  (cost=0.00..11727.20 rows=479020 width=85)
   ->  Index Scan using idx_path on path  (cost=0.56..4.32 rows=1 width=16)
         Index Cond: (path = batch.path)
(4 lignes)

It still chooses the hash join though, but by a smaller margin.

And it still only will access a very small part of path (always the same
5000 records) during the query, which isn't accounted for in the cost if
I understand correctly ?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance