Re: Surprising no use of indexes - low performance

Jeff Janes <jeff.janes@xxxxxxxxx> · Thu, 14 Feb 2013 11:27:33 -0800

On Thu, Feb 14, 2013 at 7:35 AM, Nicolas Charles
<nicolas.charles@xxxxxxxxxxxxx> wrote:
>
> It contains 11018592 entries, with the followinf patterns :
> 108492 distinct executiontimestamp
> 14 distinct nodeid
> 59 distinct directiveid
> 26 distinct ruleid
> 35 distinct serial

How many entries fall within a typical query interval of executiontimestamp?

...
>
> I'm surprised that the executiontimestamp index is not used, since it seems
> to be where most of the query time is spent.

I do not draw that conclusion from your posted information.  Can you
highlight the parts of it that lead you to this conclusion?

> For all my tests, I removed all the incoming logs, so that this table has
> only selects and no writes
>
> I'm using Postgres 8.4, on a quite smallish VM, with some process runnings,

A lot of improvements have been made since 8.4 which would make this
kind of thing easier to figure out.  What is smallish?

> with the following non default configuration
> shared_buffers = 112MB
> work_mem = 8MB
> maintenance_work_mem = 48MB
> max_stack_depth = 3MB
> wal_buffers = 1MB
> effective_cache_size = 128MB

effective_cache_size seems small unless you expect to have a lot of
this type of query running simultaneously, assuming you have at least
4GB of RAM, which I'm guessing you do based on your next comments.

> checkpoint_segments = 6
>
> Increasing the shared_buffers to 384, 1GB or 1500MB didn't improve the
> performances (less than 10%). I would have expected it to improve, since the
> indexes would all fit in RAM

If the indexes fit in RAM, they fit in RAM.  If anything, increasing
shared_buffers could make it harder to fit them entirely in RAM.  If
your shared buffers undergo a lot of churn, then the OS cache and the
shared buffers tend to uselessly mirror each other, meaning there is
less space for non-redundant pages.

>
> create index composite_idx on ruddersysevents (executiontimestamp, ruleid,
> serial, nodeid);

I wouldn't expect this to work well for this particular query.  Since
the leading column is used in a range test, the following columns
cannot be used efficiently in the index structure.  You should put the
equality-tested columns at the front of the index and the range-tested
one at the end of it.

>
> 2/ Removing nodeid from the index did lower again the perf
> create index composite2_idx on ruddersysevents (executiontimestamp, ruleid,
> serial);

I doubt that 84888.349 vs 83717.901 is really a meaningful difference.

> 3/ Removing executiontimestamp from the composite index makes the query
> performs better at the begining of its uses (around 17 secondes), but over
> time it degrades (I'm logging query longer than 20 secondes, and there are
> very rare in the first half of the batch, and getting more and more common
> at the end) to what is below

If the batch processing adds data, it is not surprising the query
slows down.  It looks like it is still faster at the end then the
previous two cases, right?

> So my question is :
> "Why *not* indexing the column which is not used makes the query slower over
> time, while not slowing the application?"

I don't know what column you are referring to here.  But it sounds
like you think that dropping the leading column from an index is a
minor change.  It is not.  It makes a fundamentally different index.

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance