On Fri, Sep 13, 2013 at 10:52 AM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:
On Thu, Sep 12, 2013 at 3:06 PM, David Whittaker <dave@xxxxxxxxxx> wrote:Interesting -- please respond with a follow up if/when you feel
> Hi All,
>
> We lowered shared_buffers to 8G and increased effective_cache_size
> accordingly. So far, we haven't seen any issues since the adjustment. The
> issues have come and gone in the past, so I'm not convinced it won't crop up
> again, but I think the best course is to wait a week or so and see how
> things work out before we make any other changes.
>
> Thank you all for your help, and if the problem does reoccur, we'll look
> into the other options suggested, like using a patched postmaster and
> compiling for perf -g.
>
> Thanks again, I really appreciate the feedback from everyone.
satisfied the problem has gone away. Andres was right; I initially
mis-diagnosed the problem (there is another issue I'm chasing that has
a similar performance presentation but originates from a different
area of the code).
That said, if reducing shared_buffers made *your* problem go away as
well, then this more evidence that we have an underlying contention
mechanic that is somehow influenced by the setting. Speaking frankly,
under certain workloads we seem to have contention issues in the
general area of the buffer system. I'm thinking (guessing) that the
problems is usage_count is getting incremented faster than the buffers
are getting cleared out which is then causing the sweeper to spend
more and more time examining hotly contended buffers. This may make
no sense in the context of your issue; I haven't looked at the code
yet. Also, I've been unable to cause this to happen in simulated
testing. But I'm suspicious (and dollars to doughnuts '0x347ba9' is
spinlock related).
Anyways, thanks for the report and (hopefully) the follow up.
merlin
You guys have taken the time to help me through this, following up is the least I can do. So far we're still looking good.