Re: Surprising no use of indexes - low performance

Nicolas Charles <nicolas.charles@xxxxxxxxxxxxx> · Fri, 15 Feb 2013 10:00:24 +0100

On 14/02/2013 20:27, Jeff Janes wrote:
On Thu, Feb 14, 2013 at 7:35 AM, Nicolas Charles
<nicolas.charles@xxxxxxxxxxxxx>  wrote:
It contains 11018592 entries, with the followinf patterns :
108492 distinct executiontimestamp
14 distinct nodeid
59 distinct directiveid
26 distinct ruleid
35 distinct serial
How many entries fall within a typical query interval of executiontimestamp?

Around 65 000 entries
.
I'm surprised that the executiontimestamp index is not used, since it seems
to be where most of the query time is spent.
I do not draw that conclusion from your posted information.  Can you
highlight the parts of it that lead you to this conclusion?
The index scan are on nodeid_idx and configurationruleid_idx, not on 
executiontimestamp
Or am I misreading the output ?

For all my tests, I removed all the incoming logs, so that this table has
only selects and no writes

I'm using Postgres 8.4, on a quite smallish VM, with some process runnings,
A lot of improvements have been made since 8.4 which would make this
kind of thing easier to figure out.  What is smallish?

A VM with 1 core, 2 GB RAM, a single hard drive

with the following non default configuration
shared_buffers = 112MB
work_mem = 8MB
maintenance_work_mem = 48MB
max_stack_depth = 3MB
wal_buffers = 1MB
effective_cache_size = 128MB
effective_cache_size seems small unless you expect to have a lot of
this type of query running simultaneously, assuming you have at least
4GB of RAM, which I'm guessing you do based on your next comments.

For the sake of the test, the VM got its memory increased, with no 
significant changes

checkpoint_segments = 6

Increasing the shared_buffers to 384, 1GB or 1500MB didn't improve the
performances (less than 10%). I would have expected it to improve, since the
indexes would all fit in RAM
If the indexes fit in RAM, they fit in RAM.  If anything, increasing
shared_buffers could make it harder to fit them entirely in RAM.  If
your shared buffers undergo a lot of churn, then the OS cache and the
shared buffers tend to uselessly mirror each other, meaning there is
less space for non-redundant pages.
Oh !
So I completely misunderstood the meaning of shared_buffer; I figured 
that it was somehow the place where the data would be stored by postgres 
(like indexes)

create index composite_idx on ruddersysevents (executiontimestamp, ruleid,
serial, nodeid);
I wouldn't expect this to work well for this particular query.  Since
the leading column is used in a range test, the following columns
cannot be used efficiently in the index structure.  You should put the
equality-tested columns at the front of the index and the range-tested
one at the end of it.

2/ Removing nodeid from the index did lower again the perf
create index composite2_idx on ruddersysevents (executiontimestamp, ruleid,
serial);

I doubt that 84888.349 vs 83717.901 is really a meaningful difference.

3/ Removing executiontimestamp from the composite index makes the query
performs better at the begining of its uses (around 17 secondes), but over
time it degrades (I'm logging query longer than 20 secondes, and there are
very rare in the first half of the batch, and getting more and more common
at the end) to what is below
If the batch processing adds data, it is not surprising the query
slows down.  It looks like it is still faster at the end then the
previous two cases, right?

Actually, the batch reads data from this table, and writes into another. 
So the content of the table doesn't change at all
And yes, it is faster than the two previous case

So my question is :
"Why *not* indexing the column which is not used makes the query slower over
time, while not slowing the application?"
I don't know what column you are referring to here.  But it sounds
like you think that dropping the leading column from an index is a
minor change.  It is not.  It makes a fundamentally different index.

I was refering to the executionTimeStamp column. I know it is a HUGE 
change, but it's clearly not behavin the way I thought
With your remark I understand a little better what is going on, and I 
can test better what I'm doing.

Thank you !
Nicolas

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance