Re: Questions about btree_gin vs btree_gist for low cardinality columns

Jeremy Finzel <finzelj@xxxxxxxxx> · Tue, 4 Jun 2019 08:37:22 -0500

On Sun, Jun 2, 2019 at 6:07 PM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Anyway, the larger point here is that right now btree_gin is just a quick

hack, and it seems like it might be worth putting some more effort into

it, because the addition of duplicate-compression changes the calculus

for whether it's useful.

Thank you to all for the thoughtful and thorough replies!

To clarify, I am definitely more interested in "low cardinality" in the sense NOT of a boolean or very few values, but rather enough values to where:
It's selective enough that having *some kind of index* actually will significantly speed up queries vs. a sequential scan
There are too many values to use partial indexes easily without it becoming a kind of maintenance nightmare
In our environment, we happen to have this kind of situation a lot.  For example, account codes, or other foreign key ids to lookup tables that categorize data in some way that is frequently queried, exist on tables with upwards of 100 million rows.  Typically it may be something like 50 to 500 unique values.

Historically, we have just had a bunch of regular btree indexes on these, and there are quite a lot of them indeed.

Here is my specific example:
Table has 127 million rows, including a toast field.  The table is 270GB
The filter is on a field with only 16 unique values.
The actual filter condition is filtering a join to 4 of the 16 unique values
As I said, performance was nearly identical for btree vs. gin.  But I was much happier with the memory usage of GIN given its tiny size:
Memory for join using btree: Buffers: shared hit=12 read=328991
Memory for join using gin: Buffers: shared hit=12 read=13961
The btree index here is 3.8GB, whereas the gin index is only 200MB.  But I have not tested how either index handles bloat.

Morris, I think the reason your gin index was clearly slower was precisely because your example uses an index-only scan - which gin does not support.

Thanks,
Jeremy