Re: Bitmap scan is undercosted? - boolean correlation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Dec 3, 2017 15:31, "Tom Lane" <tgl@xxxxxxxxxxxxx> wrote:
Jeff Janes <jeff.janes@xxxxxxxxx> writes:
> On Sat, Dec 2, 2017 at 8:04 PM, Justin Pryzby <pryzby@xxxxxxxxxxxxx> wrote:
>> It thinks there's somewhat-high correlation since it gets a list of x
>> and y values (integer positions by logical and physical sort order) and
>> 90% of the x list (logical value) are the same value ('t'), and the
>> CTIDs are in order on the new index, so 90% of the values are 100%
>> correlated.

> But there is no index involved (except in the case of the functional
> index).  The correlation of table columns to physical order of the table
> doesn't depend on the existence of an index, or the physical order within
> an index.

> But I do see that ties within the logical order of the column values are
> broken to agree with the physical order.  That is wrong, right?  Is there
> any argument that this is desirable?

Uh ... what do you propose doing instead?  We'd have to do something with
ties, and it's not so obvious this way is wrong.

Let them be tied.  If there are 10 distinct values, number the values 0 to 9, and all rows of a given distinct values get the same number for the logical order axis.

Calling the correlation 0.8 when it is really 0.0 seems obviously wrong to me.  Although if we switched btree to store duplicate values with tid as a tie breaker, then maybe it wouldn't be as obviously wrong.

Cheers,

Jeff 

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux