David Scott <davids@xxxxxxxxxxxxxx> writes: > We are trying to ascertain if we are up against the limits of what > postgres can accomplish without having the tables clustered. ... > We are aware that there is a minimum time that is required to resolve > the index values against the table to ascertain that they are live rows, > and we believe we are circumventing that time to some extent by taking > advantage of the rows being in physical order with the cluster. So does > this lead us to the conclusion that the differences in the query times > is how long is takes us to check on disk whether or not these rows are live? Both of your initial examples are bitmap scans, which should be pretty insensitive to index correlation effects --- certainly the planner assumes so. What I'd want to know about is why the planner is picking different indexes for the queries. The CLUSTER may be affecting things in some other way, like by squeezing out dead tuples causing a reduction in the total table and index sizes. The join examples use plain indexscans, which *would* be affected by correlation ... but again, why are you getting a different scan plan for "stuff" than in the non-join case? It's not helping you that the rowcount estimates are so far off. I think the different plans might be explained by the noise in the rowcount estimates. You should try increasing the statistics targets on the columns you use in the WHERE conditions. I'm not at all sure I believe your premise that querying for a different key value excludes cache effects, btw. On modern hardware it's likely that CLUSTER would leave the *whole* of these tables sitting in kernel disk cache. regards, tom lane