On 06/08/2011 06:30 AM, tv@xxxxxxxx wrote:
shared_buffers | 16MB
work_mem | 250MB
This seems a bit suspicious. Are you sure you want to keep the
shared_buffers so small and work_mem so large at the same time? There
probably are workloads where this is the right thing to do, but I doubt
this is the case. Why have you set it like this?
I must concur in this case. I can't imagine any scenario where this
makes sense. Work-mem is allocated on a per-sort basis, not just per
session or transaction. So a large query could allocate several of these
and run your system out of memory and cause the OOM killer to start
causing trouble.
I don't have much experience with running Pg on AWS, but I'd try to
increase the shared buffers to say 512MB and decrease the work_mem to
16MB (or something like that).
Easily good minimums. But it looks like your AWS only has 1GB of RAM
(based on your effective_cache_size), so you may only want to increase
it to 256MB. That said, reduce your work_mem to 8MB to start, and
increase it in 4MB increments if it's still too low.
With a setting of 16MB, it has to load data in and out of memory
constantly. Even if the host OS has cached every single block you'll
ever use, that's only the raw table contents. Processing hundreds of
thousands of rows still takes time, you just saved yourself the effort
of fetching them from disk, shared_buffers is still necessary to do
actual work.
Now... I have some issues with your queries, which are likely the fault
of the Django ORM, but still consider your analyze:
> http://explain.depesz.com/s/vEx
Your bitmap index scan on logparser is hilarious. The estimates are
fine. 237k rows in 47ms when it expected 217k. If your table really does
have 815k rows in it, that's not very selective at all. Then it adds a
heap scan for the remaining where conditions, and you end up with 100k
rows it then has to sort. That's never going to be fast. 600ms actually
isn't terrible for this many rows, and it also explains your high CPU.
Then your next one:
> http://explain.depesz.com/s/Rhb
700ms, mostly because of the HashAggregate caused by grouping by
round(((seconds_since_start / 42)). You're aggregating by a calculation
on 100k rows. Again, this will never be "fast" and 700ms is not terrible
considering all the extra work the engine's doing. Again, your index
scan returning everything and the kitchen sink is the root cause. Which
also is evidenced here:
> http://explain.depesz.com/s/JUo
And here:
http://explain.depesz.com/s/VZA
Everything is being caused because it's always using the
ogparser_entry_event_type_like index to fetch the initial 200k rows. The
only way to make this faster is to restrict the rows coming back. For
instance, since you know these values are coming in every day, why
search through all of history every time?
Why not get your timestamp column involved? Maybe you only need to look
at Attack, DoT Tick, and Critical Attack event types for the last day,
or week, or even month. That alone should drastically reduce your row
counts and give the engine a much smaller data set to aggregate and sort.
The thing is, the way your queries are currently written, as you get
more data, this is just going to get worse and worse. Grabbing a quarter
of a table that just gets bigger every day and then getting aggregates
(group by, etc) is going to get slower every day unless you can restrict
the result set with more where clauses. If you need reports on a lot of
this data on a regular basis, consider running a nightly or hourly batch
to insert them into a reporting table you can check later.
There's a lot you can do here.
--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@xxxxxxxxx
______________________________________________
See http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance