On Sat, Dec 02, 2017 at 05:27:51PM -0800, Jeff Janes wrote: > I think the non-extended stats code also has trouble with booleans. > pg_stats gives me a correlation of 0.8 or higher for the flag column. It's not due to the boolean though; you see the same thing if you do: CREATE INDEX aaa_f ON aaa((flag::text)); ANALYZE aaa; correlation | 0.81193 or: ALTER TABLE aaa ADD flag2 int; UPDATE aaa SET flag2= flag::int correlation | 0.81193 I think it's caused by having so few (2) values to correlate. most_common_vals | {f,t} most_common_freqs | {0.9014,0.0986} correlation | 0.822792 It thinks there's somewhat-high correlation since it gets a list of x and y values (integer positions by logical and physical sort order) and 90% of the x list (logical value) are the same value ('t'), and the CTIDs are in order on the new index, so 90% of the values are 100% correlated. It improves (by which I mean here that it spits out a lower number) if it's not a 90/10 split: CREATE TABLE aaa5 AS SELECT (id%100)::int num, (id%10>5)::bool flag FROM generate_series(1, 10000000) id; CREATE INDEX ON aaa5 (flag); tablename | aaa5 attname | flag correlation | 0.522184 Justin