Re: Compound Indexes

Chris Browne <cbbrowne@xxxxxxx> · Tue, 14 Aug 2007 16:25:55 -0400

phoenix.kiula@xxxxxxxxx ("Phoenix Kiula") writes:
> I have a table with ten columns. My queries basically one column as
> the first WHERE condition, so an index on that column is certain. But
> the columns after that one vary depending on end-user's choice (this
> is a reporting application) and so does the sorting order.
>
> In MySQL world, I had sort_buffer in the config file, and I made a
> compound index with the columns most often used in these types of
> queries. So my index looked like:
>
>   INDEX idx_trades(id, t_id, c_id, s_id,  t_brief, created_on);
>
> This has five columns in it. While reading the pgsql documentation, I
> gather than anything beyond three columns offers diminishing benefits.
>
> My queries will look like these:
>
>    SELECT * from trades where id = 99999
>    and c_id = 9999
>    ORDER by s_id;
>
>    SELECT * from trades where id = 99999
>    and s_id = 99990
>    ORDER by created_on desc ;
>
>    SELECT * from trades where id = 99999
>    and s_id = 99990
>    and t_brief ~* 'more|than|one|word'
>    ORDER by created_on desc ;
>
> So my question: how does PGSQL optimize its sorts? If I were to index
> merely the columns that are most commonly used in the reporting WHERE
> clause, would that be ok? Some ofthese columns may be "TEXT" type --
> how should I include these in the index (in MySQL, I included only the
> first 100 words in the index).

If you have only these three sorts of queries, then I would speculate
that the following indices *might* be useful:

 create idx1 on trades (id);
 create idx2 on trades (c_id);
 create idx3 on trades (s_id);
 create idx4 on trades (created_on);
 create idx5 on trades (created_on) where t_brief ~* 'more|than|one|word';
 create idx6 on trades (id, s_id) where t_brief ~* 'more|than|one|word';

(I'm assuming with idx5 and idx6 that you were actually searching for
'more|than|one|word'; if what is searched for can vary, then idx5/idx6
are worthless.)

You could try adding them all, and check out which of them are
actually used by the query planner.  And eventually drop out the
irrelevant ones.

PostgreSQL has a rather sophisticated query planner (pretty much
"rocket science," compared to MySQL), and it is even possible that it
would use multiple of those indices simultaneously for some of the
queries.  Which indexes, if any, it will use will vary from query to
query based on the parameters in the query.

You can determine the query plan by prefixing the query with the
keyword "EXPLAIN."

Suppose you add the above 6 indexes, you could get query plans via
running the following:

>    EXPLAIN SELECT * from trades where id = 99999
>    and c_id = 9999
>    ORDER by s_id;
>
>    EXPLAIN SELECT * from trades where id = 99999
>    and s_id = 99990
>    ORDER by created_on desc ;
>
>    EXPLAIN SELECT * from trades where id = 99999
>    and s_id = 99990
>    and t_brief ~* 'more|than|one|word'
>    ORDER by created_on desc ;

You may want to post the output to the list; learning to read query
planner output is a bit of an art, and you won't necessarily make the
right sense of the results on day #1...
-- 
output = ("cbbrowne" "@" "cbbrowne.com")
http://www3.sympatico.ca/cbbrowne/spreadsheets.html
Consciousness - that annoying time between naps.

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster