On Tue, 2006-06-27 at 16:14 +0400, Andrew Sagulin wrote: > Result showed up that there were no page seq scan at all - true random access > only. > The simple model which can explain the situation: the sequence of numbers 2, 1, > 4, 3, 6, 5, ..., 100, 99 has correlation about 0,9994. Let's imagine it's the page > order of an index scan. H'm, bad choice, isn't it? Your example is only possible if whole blocks of data were out of order, which I guess is possible within a multi-million row table. Slightly out of order values would be ignored, since I/O works at the block rather than the tuple level. ANALYZE doesn't cope well with tables as large as you have. It doesn't sample enough rows, nor does it look within single blocks/groups to discover anomalies such as yours. As a result, the data that is sampled looks almost perfectly ordered, though the main bulk is not. I think what you are also pointing out is that the assumption of the effects of correlation doesn't match the current readahead logic of filesystems. If we were to actively force a readahead stride of 2 for this scan (somehow), then the lack of correlation would disappear completely. IIRC the current filesystem readahead logic would find that such a sequence would be seen as random, and so no readahead would be performed at all - even though the data is highly correlated. That isn't PostgreSQL's fault directly, since the readahead ought to work better than it does, but we fail indirectly by relying upon it in this case. > I think indexCorrelation can help to estimate page count but not page > fetch cost. Why not to use formula > > min_IO_cost = ceil(indexSelectivity * T) * random_page_cost > > instead of > > min_IO_cost = ceil(indexSelectivity * T) ? That part is sensible. The min_IO_cost is when the access is sequential, which by definition has a cost of 1.0. The bit you might have issue with is how we extrapolate from the min_IO_cost and correlation to arrive at a cost. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com