Re: Questions about btree_gin vs btree_gist for low cardinality columns

Will Hartung <willhartung@xxxxxxxxx> · Thu, 30 May 2019 10:11:36 -0700

On May 29, 2019, at 10:59 AM, Jeremy Finzel <finzelj@xxxxxxxxx> wrote:

On Fri, May 24, 2019 at 10:25 AM Jeremy Finzel <finzelj@xxxxxxxxx> wrote:
I have been hoping for clearer direction from the community about specifically btree_gin indexes for low cardinality columns (as well as low cardinality multi-column indexes).  In general there is very little discussion about this both online and in the docs.  Rather, the emphasis for GIN indexes discussed is always on full text search of JSON indexing, not btree_gin indexes.

However, I have never been happy with the options open to me for indexing low cardinality columns and was hoping this could be a big win.  Often I use partial indexes as a solution, but I really want to know how many use cases btree_gin could solve better than either a normal btree or a partial index.

Here are my main questions:

1.

"The docs say regarding *index only scans*: The index type must support index-only scans. B-tree indexes always do. GiST and SP-GiST indexes support index-only scans for some operator classes but not others. Other index types have no support. The underlying requirement is that the index must physically store, or else be able to reconstruct, the original data value for each index entry. As a counterexample, GIN indexes cannot support index-only scans because each index entry typically holds only part of the original data value."

This is confusing to say "B-tree indexes always do" and "GIN indexes cannot support index-only scans", when we have a btree_gin index type.  Explanation please ???

Is it true that for a btree_gin index on a regular column, "each index entry typically holds only part of the original data value"?  Do these still not support index only scans?  Could they?  I can't see why they shouldn't be able to for a single indexed non-_expression_ field?

Well part of the problem is that “B-tree” is a pretty generic term. Pretty much every index is either some form of hash, or some form of tree. We use the term B-tree, even though pretty much no major index is actually a binary tree. They almost always have multiple branches per node.

It’s also important to note that over time, while we may use the term B-Tree generically, there are many specialized trees used today.

Historically, the conventional B+Tree is popular for database indexes. A B+Tree rather than relying on “nodes” per se, really is based on pages. Nodes alone tend to be too fine grained, so it’s a game of mapping the nodes to pages on disk. The page size can impact the structure of the tree.

In a typical B+Tree the values of the index are stored in leaves of the tree, those leaves are then linked together in order. An index scan works because the data in the leaves matches the data in the database record. 

Say you’re doing something like “SELECT ID FROM TABLE WHERE ID LIKE ‘&X&’”. A basic index won’t work for a query like that, so the DB will need to scan every row. But since ID (in this case) is indexed, we have all of the ID’s in the table stored both in the table rows themselves, but also in the ID index. Since there are more ID’s per page in the ID index, and the ID index has all of the data we need (i.e. the IDs), there’s not reason to table scan the table itself. Rather, you can just scan the index. Load every page from the index, walk the nodes, and match the values for &X&. Much less I/O.

Now, things like the GIN indexes, while still trees, store the data differently. Where a B+Tree stores the indexes in the leaf nodes of the tree, the GIN indexes break the values down and store parts of the keys in to the individual nodes of the tree. To reconstruct the index value, you walk all of the nodes to the end of the branch. A B+Tree, each node represents the range of nodes along the branch. So, if you have 1-2-3-4-5-6 in a tree, the root of the tree may say 1-3 are on the left branch, 4-6 are on the right branch. Looking for 2, you go left. There the node says “1-2 are on the left branch, 3 is on the right branch”. You go left again, find the node that says “1 is to the left, 2 is to the right”. Go right, and you find 2. Tada!

In a GIN index, if you index “ABCDEF”, there will be a A node, B node, C node, D node, etc. (And, please, appreciate I am talking in broad brush generalities here for the sake of exposition, not specific details of implementation or algorithms). So, simply, by the time you get to the bottom of an index, all you find is the pointer to the data, not the actual key that got you there. The Key is the path through the tree.

The reasons for this, especially for things like full text search, is a lot of words have a lot in common (notably prefixes) so it makes sense for specialized indexes to basically compact themselves by removing the prefixes. Basically each node says “all the words that start with ‘the’ (the, them, their, there, thespian) are this way". So, you’ll see a node “the” beneath that you might find the node “spian” (thespian with the ‘the’ removed). These indexes are more compact and better suited for text (vs, say, random GUIDs). Because text has notable properties of information density and such.

For just numbers (especially random numbers), there’s no real value.

This is why “scanning the index” doesn’t work well on these kinds of structures compared to a classic B-Tree. Nor does a HASH index. Since, hash indexes store the hash — not the key value. Nothing there to scan of value.

2. 

Lack of index only scans is definitely a downside.  However, I see basically identical performance, but way less memory and space usage, for gin indexes.  In terms of read-only performance, if index only scans are not a factor, why not always recommend btree_gin indexes instead of regular btree for low cardinality fields, which will yield similar performance but use far, far less space and resources?

I simply can’t speak to the applicability of index to low cardinality columns as those relate to different issues not directly related to searching the index.

If you have 10M rows with a “STATUS” column of 1 or 2, and an index on that column, then you have a 2 node index with a bazillion row pointers. Some systems (I can’t speak to PG in this regard) degenerate in this kind of use case since the index is more or less designed to work great in unique identifiers than low cardinality values. The representation of large record pointer lists may just not be handled well as edge cases.

Historically, I try to avoid low cardinality indexes, simply because I’ve had problems in the past (yes, decades ago, but lessons learned). In those case I force high cardinality (in this case, maybe create a combined index of STATUS and ID, which makes an ostensibly low value index in to a unique one). This is very important for heavily weight indexes (like where most of the statues are 1, but you mostly just want to look for the 2’s). But then its a matter of convincing the optimizer that the index isn’t utterly worthless no matter what you query for and table scans anyway.

Regards,
Will Hartung