Re: Simple join doesn't use index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday, January 28, 2013, Alex Vinnik wrote:
It sure turned out that default settings are not a good fit. Setting random_page_cost to 1.0 made query to run in 2.6 seconds and I clearly see that indexes are being used in explain plan and IO utilization is close to 0.

This is not surprising.  The default settings are aimed at quite small servers, while you seem to have a rather substantial one.  Have you done anything yet to fix effective_cache_size?

 
 
QUERY PLAN
Sort  (cost=969787.23..970288.67 rows=200575 width=8) (actual time=2176.045..2418.162 rows=241238 loops=1)


However I noticed that sorting is done using disk("external sort  Disk: 4248kB")

As far as pgsql is concerned, it is using the disk.  But the kernel is probably buffering that to an extent that the disk is probably being touched rather little.  So I wouldn't worry about it.


 
which prompted me to take a look at work_mem. But it turned out that small increase to 4MB from default 1MB turns off index usage and query gets x10 slower. IO utilization jumped to 100% from literally nothing. so back to square one...

QUERY PLAN
Sort  (cost=936642.75..937144.19 rows=200575 width=8) (actual time=33200.762..33474.443 rows=241238 loops=1)

And why should the IO utilization have jumped?  Is everything in memory, or is it not?  You should run your EXPLAINs with (analyze, buffers), and also you should turn on track_io_timings, at least in the local session; that will give us some insights.

If everything is in memory, then why is the seq scan taking so long?  If not, then why is the nested loop such a good idea?  (In my hands, when everything does *not* fit in memory, the nested loop is very very bad)
 
You seem have a bit of an infatuation with Dec 15th, running that one query over and over and over.  Why?  If the real 
live query is not just for that one day repeatedly, then you should test with different days, not just one day repeatedly.  (And if your real query really is like the movie "Groundhog Day", you should probably cluster or partition with that in mind.)

Anyway, there was an issue introduced in 9.2.0 and to be removed in 9.2.3 which over-penalized nested loops that had large indexes on the inner side.  Since your different plans are so close to each other in estimated cost, I think this issue would be enough to tip it into the seq scan.  Also, your poor setting of effective_cache_size might also be enough to tip it. And both combined, almost certainly are.

But ultimately, I think you are optimizing for a case that does not actually exist.

Cheers,

Jeff

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux