Search Postgresql Archives

Re: Questions about btree_gin vs btree_gist for low cardinality columns

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jun 2, 2019 at 6:07 PM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Anyway, the larger point here is that right now btree_gin is just a quick
hack, and it seems like it might be worth putting some more effort into
it, because the addition of duplicate-compression changes the calculus
for whether it's useful.

Thank you to all for the thoughtful and thorough replies!

To clarify, I am definitely more interested in "low cardinality" in the sense NOT of a boolean or very few values, but rather enough values to where:
  1. It's selective enough that having *some kind of index* actually will significantly speed up queries vs. a sequential scan
  2. There are too many values to use partial indexes easily without it becoming a kind of maintenance nightmare
In our environment, we happen to have this kind of situation a lot.  For example, account codes, or other foreign key ids to lookup tables that categorize data in some way that is frequently queried, exist on tables with upwards of 100 million rows.  Typically it may be something like 50 to 500 unique values.

Historically, we have just had a bunch of regular btree indexes on these, and there are quite a lot of them indeed.

Here is my specific example:
  • Table has 127 million rows, including a toast field.  The table is 270GB
  • The filter is on a field with only 16 unique values.
  • The actual filter condition is filtering a join to 4 of the 16 unique values
As I said, performance was nearly identical for btree vs. gin.  But I was much happier with the memory usage of GIN given its tiny size:
  • Memory for join using btree: Buffers: shared hit=12 read=328991
  • Memory for join using gin: Buffers: shared hit=12 read=13961
The btree index here is 3.8GB, whereas the gin index is only 200MB.  But I have not tested how either index handles bloat.

Morris, I think the reason your gin index was clearly slower was precisely because your example uses an index-only scan - which gin does not support.

Thanks,
Jeremy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux