Well part of the problem is that “B-tree” is a pretty generic term. Pretty much every index is either some form of hash, or some form of tree. We use the term B-tree, even though pretty much no major index is actually a binary tree. They almost always have multiple branches per node. It’s also important to note that over time, while we may use the term B-Tree generically, there are many specialized trees used today. Historically, the conventional B+Tree is popular for database indexes. A B+Tree rather than relying on “nodes” per se, really is based on pages. Nodes alone tend to be too fine grained, so it’s a game of mapping the nodes to pages on disk. The page size can impact the structure of the tree. In a typical B+Tree the values of the index are stored in leaves of the tree, those leaves are then linked together in order. An index scan works because the data in the leaves matches the data in the database record. Say you’re doing something like “SELECT ID FROM TABLE WHERE ID LIKE ‘&X&’”. A basic index won’t work for a query like that, so the DB will need to scan every row. But since ID (in this case) is indexed, we have all of the ID’s in the table stored both in the table rows themselves, but also in the ID index. Since there are more ID’s per page in the ID index, and the ID index has all of the data we need (i.e. the IDs), there’s not reason to table scan the table itself. Rather, you can just scan the index. Load every page from the index, walk the nodes, and match the values for &X&. Much less I/O. Now, things like the GIN indexes, while still trees, store the data differently. Where a B+Tree stores the indexes in the leaf nodes of the tree, the GIN indexes break the values down and store parts of the keys in to the individual nodes of the tree. To reconstruct the index value, you walk all of the nodes to the end of the branch. A B+Tree, each node represents the range of nodes along the branch. So, if you have 1-2-3-4-5-6 in a tree, the root of the tree may say 1-3 are on the left branch, 4-6 are on the right branch. Looking for 2, you go left. There the node says “1-2 are on the left branch, 3 is on the right branch”. You go left again, find the node that says “1 is to the left, 2 is to the right”. Go right, and you find 2. Tada! In a GIN index, if you index “ABCDEF”, there will be a A node, B node, C node, D node, etc. (And, please, appreciate I am talking in broad brush generalities here for the sake of exposition, not specific details of implementation or algorithms). So, simply, by the time you get to the bottom of an index, all you find is the pointer to the data, not the actual key that got you there. The Key is the path through the tree. The reasons for this, especially for things like full text search, is a lot of words have a lot in common (notably prefixes) so it makes sense for specialized indexes to basically compact themselves by removing the prefixes. Basically each node says “all the words that start with ‘the’ (the, them, their, there, thespian) are this way". So, you’ll see a node “the” beneath that you might find the node “spian” (thespian with the ‘the’ removed). These indexes are more compact and better suited for text (vs, say, random GUIDs). Because text has notable properties of information density and such. For just numbers (especially random numbers), there’s no real value. This is why “scanning the index” doesn’t work well on these kinds of structures compared to a classic B-Tree. Nor does a HASH index. Since, hash indexes store the hash — not the key value. Nothing there to scan of value.
I simply can’t speak to the applicability of index to low cardinality columns as those relate to different issues not directly related to searching the index. If you have 10M rows with a “STATUS” column of 1 or 2, and an index on that column, then you have a 2 node index with a bazillion row pointers. Some systems (I can’t speak to PG in this regard) degenerate in this kind of use case since the index is more or less designed to work great in unique identifiers than low cardinality values. The representation of large record pointer lists may just not be handled well as edge cases. Historically, I try to avoid low cardinality indexes, simply because I’ve had problems in the past (yes, decades ago, but lessons learned). In those case I force high cardinality (in this case, maybe create a combined index of STATUS and ID, which makes an ostensibly low value index in to a unique one). This is very important for heavily weight indexes (like where most of the statues are 1, but you mostly just want to look for the 2’s). But then its a matter of convincing the optimizer that the index isn’t utterly worthless no matter what you query for and table scans anyway. Will Hartung |