Re: I'd like to learn a bit more about how indexes work

Dann Corbit <DCorbit@xxxxxxxxx> · Wed, 6 Jun 2012 00:21:37 +0000

-----Original Message-----
From: Mike Christensen [mailto:mike@xxxxxxxxxxxxx] 
Sent: Tuesday, June 05, 2012 4:28 PM
To: Dann Corbit
Cc: pgsql-general@xxxxxxxxxxxxxx
Subject: Re:  I'd like to learn a bit more about how indexes work

I'm aware of how "B-Trees" work, but I only understand them on the level of traversing a single tree at a time.  I'm curious how Postgres combines multiple indexes with OR and AND clauses.

I've done some Google searches, however I can't find anything basic.
Everything I've found assumes you already have knowledge of terms such as "hash join, aggregate join, etc".

At this point I'm not looking at learning how the optimizer works..
>>
"How the optimizer works" is the answer to your question.
The plan of attack for forming a query is a function of the optimizer.

One possible plan  for " WHERE data = key1 OR data = key2 " is something along the lines of:
SEEK("key1")
While key == key1 accumulate rows into the result set
   GetNextRow()
SEEK("key2")
While key == key2 accumulate rows into the result set
   GetNextRow()

However, if the table is tiny (suppose it is ten rows and fits into memory) then a table scan might be cheaper.

Here at CONNX, I have written a hashed btree search that tends to be cheaper than using a clustered index if there are no qualifiers on the join.
For instance
SELECT a.*, b.* from table1 a, table2 b WHERE a.unique_index = b.foreign_key

It will be faster to actually not use the index.  Whereas if there are additional where clause criteria such as:
SELECT a.*, b.* from table1 a, table2 b WHERE a.unique_index = b.foreign_key AND a.unique_index IN (k1, k2, k3, k4,..., kn, kn+1)
It will probably be faster to use the index unless the list of items is a substantial proportion of the possible data values.

The point is that there is not a simple formula that describes how data values are retrieved from the database.  The method of collection is decided by the optimizer.

It isn't always cheaper to use an index.  In fact, sometimes building an index is a complete waste of time.
For instance, suppose that you have a column named 'sex' that can contain the values 'F', 'M', and 'U'
If you built an index on that column it won't help you to find all the males faster than a table scan because the data is not specific enough so that the total number of pages of disk that are read would be more with the index than if the index were not used.
So, I suggest that possibly the articles you do not want to read are the very ones that will answer your questions.

On the other hand, it is not unlikely that I simply do not understand the questions that you are asking.
<<

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general