On 19/12/2013 19:33, Jeff Janes wrote: > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------- > Nested Loop (cost=0.56..4001768.10 rows=479020 width=26) (actual > time=2.303..15371.237 rows=479020 loops=1) > Output: path.pathid, batch.filename > Buffers: shared hit=2403958 read=7539 > -> Seq Scan on public.batch (cost=0.00..11727.20 rows=479020 > width=85) (actual time=0.340..160.142 rows=479020 loops=1) > Output: batch.path, batch.filename > Buffers: shared read=6937 > -> Index Scan using idx_path on public.path (cost=0.56..8.32 > rows=1 width=16) (actual time=0.030..0.031 rows=1 loops=479020) > Output: path.pathid, path.path > Index Cond: (path.path = batch.path) > Buffers: shared hit=2403958 read=602 > Total runtime: 15439.043 ms > > > As you can see, more than twice as fast, and a very high hit ratio > on the path table, even if we start from a cold cache (I did, here, > both PostgreSQL and OS). We have an excellent hit ratio because the > batch table contains few different path (several files in a > directory), and is already quite clustered, as it comes from a > backup, which is of course performed directory by directory. > > > What is your effective_cache_size set to? > > Cheers, > > Jeff Yeah, I had forgotten to set it up correctly on this test environment (its value is correctly set in production environments). Putting it to a few gigabytes here gives me this cost: bacula=# explain select pathid, filename from batch join path using (path); QUERY PLAN ---------------------------------------------------------------------------- Nested Loop (cost=0.56..2083904.10 rows=479020 width=26) -> Seq Scan on batch (cost=0.00..11727.20 rows=479020 width=85) -> Index Scan using idx_path on path (cost=0.56..4.32 rows=1 width=16) Index Cond: (path = batch.path) (4 lignes) It still chooses the hash join though, but by a smaller margin. And it still only will access a very small part of path (always the same 5000 records) during the query, which isn't accounted for in the cost if I understand correctly ? -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance