On Tue, Dec 4, 2012 at 10:03 AM, <postgresql@xxxxxxxxx> wrote: > > Though that doesn't account for the 70x difference between the speed of the > two queries in actuality given a pretty similar expected speed (does it?). It kind of does. The expected speed is predicated on the number of rows being 200 fold higher. If the number of rows actually was that much higher, the two speeds might be closer together. That is why it would be interesting to see a more typical case where the actual number of rows is closer to the 2000 estimate. But I am curious about how the cost estimate for the primary key look up is arrived at: Index Scan using cons_pe_primary_key on position_effect (cost=0.00..42.96 rows=1 width=16) There should be a random page for the index leaf page, and a random page for the heap page. Since you set random_page_cost to 2, that comes up to 4. Then there would be some almost negligible CPU costs. Where the heck is the extra 38 cost coming from? > It does go some way to explaining why a bad choice of plan was made. > > Is there some nice bit of literature somewhere that explains what sort of > costs are associated with the different types of lookup? I've heard good things about Greg Smith's book, but I don't know if it covers this particular thing. Otherwise, I don't know of a good single place which is a tutorial rather than a reference (or the code itself) >>> First, make sure caching isn't interfering with your results. Run each >>> query several times. >> If that is not how the production system works (running the same query > over and over) then you want to model the cold cache, not the hot one. >> But in any case, the posted explains indicates that all buffers were > cached. > > We are in the rather pleasant situation here in that we are willing to spend > money on the box (up to a point, but quite a large point) to get it up to > the spec so that it should hardly ever need to touch the disk, the trick is > figuring out how to let our favourite database server know that. Well, that part is fairly easy. Make random_page_cost and seq_page_cost much smaller than their defaults. Like, 0.04 and 0.03, for example. I think the *_page_cost should strictly an estimate of actually doing IO, with a separate parameter to reflect likelihood of needing to do the IO, like *_page_cachedness. But that isn't the way it is done currently. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance