Re: Questions about btree_gin vs btree_gist for low cardinality columns

Gavin Flower <GavinFlower@xxxxxxxxxxxxxxxxx> · Sat, 1 Jun 2019 20:24:00 +1200

On 01/06/2019 14:52, Morris de Oryx wrote:
[...]
For an example, imagine an address table with 100M US street addresses 
with two character state abbreviations. So, say there are around 60 
values in there (the USPS is the mail system for a variety of US 
territories, possessions and friends in the Pacific.) Okay, so what's 
the best index type for state abbreviation? For the sake of argument, 
assume a normal distribution so something like FM (Federated States of 
Micronesia) is on a tail end and CA or NY are a whole lot more common.

[...]

I'd expect the distribution of values to be closer to a power law than 
the Normal distribution -- at very least a few states would have the 
most lookups.  But this is my gut feel, not based on any scientific 
analysis!

Cheers,
Gavin