Re: Performance hit on large row counts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



David Scott <davids@xxxxxxxxxxxxxx> writes:
> We are trying to ascertain if we are up against the limits of what 
> postgres can accomplish without having the tables clustered.  ...

> We are aware that there is a minimum time that is required to resolve 
> the index values against the table to ascertain that they are live rows, 
> and we believe we are circumventing that time to some extent by taking 
> advantage of the rows being in physical order with the cluster.  So does 
> this lead us to the conclusion that the differences in the query times 
> is how long is takes us to check on disk whether or not these rows are live?

Both of your initial examples are bitmap scans, which should be pretty
insensitive to index correlation effects --- certainly the planner
assumes so.  What I'd want to know about is why the planner is picking
different indexes for the queries.  The CLUSTER may be affecting things
in some other way, like by squeezing out dead tuples causing a
reduction in the total table and index sizes.

The join examples use plain indexscans, which *would* be affected by
correlation ... but again, why are you getting a different scan plan
for "stuff" than in the non-join case?

It's not helping you that the rowcount estimates are so far off.
I think the different plans might be explained by the noise in the
rowcount estimates.

You should try increasing the statistics targets on the columns you use
in the WHERE conditions.

I'm not at all sure I believe your premise that querying for a different
key value excludes cache effects, btw.  On modern hardware it's likely
that CLUSTER would leave the *whole* of these tables sitting in kernel
disk cache.

			regards, tom lane


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux