Btree vs. GIN

Jack Orenstein <jao@xxxxxxxxxxxx> · Fri, 1 Jan 2021 12:20:49 -0500

I am working on a research project involving a datatype, D, with the following characteristics:

- A value of D is a variable-length binary string.

- A value of D usually gives rise to a single index term, but there could occasionally be more, (up to some configurable maximum).

- The index terms are int64.

GIN looks like a good fit for my requirements, and I've done a little experimentation (using btree_gin) to determine that the optimizer is working as I'd hoped.

So I'm going to proceed with GIN indexes, writing an extension for my datatype. But I'm wondering about btrees and GIN, in general. This discussion was pretty interesting: https://www.postgresql-archive.org/Questions-about-btree-gin-vs-btree-gist-for-low-cardinality-columns-td6088041.html.

My main questions are about the remaining differences between btree and GIN indexes. With btree key deduplication in Postgres 12, one of the main differences between btrees and GIN is gone, (my understanding is that GIN has done this optimization for a long time).

I think that one other major difference between the two is that GIN can handle multiple keys per row, while a btree cannot. Is there some fundamental reason why the btree cannot accommodate multiple keys per row? I think that this would have no impact on the btree structure itself. The only difference would be that the same row (ctid) could be associated with multiple keys.

I guess the top-level question is this: Is it possible in principle for the btree index to subsume GIN index capabilities?

Jack Orenstein