On Tue, Dec 05, 2017 at 01:50:11PM -0500, Tom Lane wrote: > Jeff Janes <jeff.janes@xxxxxxxxx> writes: > > On Dec 3, 2017 15:31, "Tom Lane" <tgl@xxxxxxxxxxxxx> wrote: > >> Jeff Janes <jeff.janes@xxxxxxxxx> writes: > >>> But I do see that ties within the logical order of the column values are > >>> broken to agree with the physical order. That is wrong, right? Is there > >>> any argument that this is desirable? > > >> Uh ... what do you propose doing instead? We'd have to do something with > >> ties, and it's not so obvious this way is wrong. > > > Let them be tied. ... > I thought some more about this. What we really want the correlation stat > to do is help us estimate how randomly an index-ordered scan will access > the heap. If the values we've sampled are all unequal then there's no > particular issue. However, if we have some group of equal values, we > do not really know what order an indexscan will visit them in. The > existing correlation calculation is making the *most optimistic possible* > assumption, that such a group will be visited exactly in heap order --- > and that assumption isn't too defensible. I'm interested in discusstion regarding bitmap cost, since it would have helped our case discussed here ~18 months ago: https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com#20160524173914.GA11880@xxxxxxxxxxxxx ...but remember: in Vitaliy's case (as opposed to mine), the index scan is *faster* but being estimated at higher cost than bitmap (I have to keep reminding myself). So the rest of this discussion is about the overly-optimistic cost estimate of index scans, which moves in the opposite direction for this reported problem. For the test cases I looked at, index scans were used when RPC=1 and redundant conditions were avoided, so I'm not sure if there's any remaining issue (but I haven't looked at the latest cases Vitaliy sent). > In any case, given that we do this calculation without regard > to any specific index, One solution is to compute stats (at least correlation) for all indices, not just expr inds. I did that earlier this year while throwing around/out ideas. https://www.postgresql.org/message-id/20170707234119.GN17566%40telsasoft.com > We do have an idea, from the data we have, whether the duplicates are close > together in the heap or spread all over. I think you just mean pg_stats.correlation for all values, not just duplicates (with the understanding that duplicates might be a large fraction of the tuples, and high weight in correlation). Another issue I noted in an earlier thread is that as table sizes increase, the existing correlation computation approaches 1 for correlated insertions, (like "append-only" timestamps clustered around now()), due to ANALYZE sampling a fraction of the table, and thereby representing only large-scale correlation, and, to an increasing degree, failing to represent small-scale variations between adjacent index TIDs, which has real cost (and for which the mitigation by cache effects probably decreases WRT table size, too). I think any solution needs to handle this somehow. Generated data demonstrating this (I reused this query so it's more complicated than it needs to be): [pryzbyj@database ~]$ time for sz in 9999{,9{,9{,9{,9}}}} ; do psql postgres -tc "DROP TABLE IF EXISTS t; CREATE TABLE t(i float, j int); CREATE INDEX ON t(i);INSERT INTO t SELECT i/99999.0+pow(2,(-random())) FROM generate_series(1,$sz) i ORDER BY i; ANALYZE t; SELECT $sz, correlation, most_common_freqs[1] FROM pg_stats WHERE attname='i' AND tablename='t'"; done 9999 | 0.187146 | 99999 | 0.900629 | 999999 | 0.998772 | 9999999 | 0.999987 | Trying to keep it all in my own head: For sufficiently large number of pages, bitmap scan should be preferred to idx scan due to reduced random-page-cost outweighing its overhead in CPU cost. Probably by penalizing index scans, not discounting bitmap scans. Conceivably a correlation adjustment can be conditionalized or weighted based on index_pages_fetched() ... x = ln (x/999999); if (x>1) correlation/=x; I think one could look at the fraction of duplicated index keys expected to be returned: if we expect to return 1000 tuples, with 200 duplicates from MCV, cost_index would multiply correlation by (1 - 200/1000), meaning to use something closer to max_IO_cost rather than min_IO_cost. I imagine it'd be possible to limit to only those MCVs which pass quals - if none pass, then there may be few tuples returned, so apply no correction to (additionally) penalize index scan. In my tests, at one point I implemented idx_corr_fudge(), returning a value like "fragmentation" from pgstatindex (which I'm sure is where I got the phrase when reporting the problem). That only uses the leaf nodes' "next" pointer, and not the individual tuples, which probably works if there's a sufficiently number of repeated keys. I think that's all for now.. Justin