Re: work_mem in high transaction rate database

Scott Marlowe <scott.marlowe@xxxxxxxxx> · Wed, 4 Mar 2009 14:49:29 -0700

On Wed, Mar 4, 2009 at 11:18 AM, Scott Carey <scott@xxxxxxxxxxxxxxxxx> wrote:
> You may have decreased performance in your batch jobs with the lower
> work_mem setting.

That would be why I recommended benchmarking queries that need more
memory and setting work_mem for those queries alone.

> Additionally, the fact that you haven’t had swap storm issues so far means
> that although there is certain risk of an issue, its probably a lot lower
> than what has been talked about here so far.

No, it means you and the OP are guessing at what's a good number
without any actual proof of it.  Guessing it not a particularly good
method for setting work_mem, especially on a server with 2000+
connections.

> Without a change in client behavior (new queries or large change in data) a
> change in load alone is very unlikely to cause a problem.

That is demonstrably incorrect.  If the average number of live queries
out of the 2000 connections is currently 10, and an increase in load
makes it 500, there is a very REAL chance of running the server out of
memory.

> So take your time
> to do it right.

I believe I made mention of benchmarking queries above and in my first
post.  But doing it right does NOT mean setting work_mem to 2G then
whittling it down as your server crashes under load.

>  I disagree with the knee-jerk “change it now!” response.
>  The very fact you have gotten this far means it is not as risky as the bare
> settings indicate.

Sorry, but I disagree back at you, and it's not a knee jerk reaction,
it's a reaction honed from years of watching supposedly stable
postgresql servers crash and burn under slashdot effect type loads.

> Definitely plan on improving and testing out work_mem variants soon, but a
> hasty change to a small value might cause your batch jobs to take much
> longer — what is your risk if they take too long or don’t complete in time?
>  That risk is yours to assess — if its not much of a risk, then by all means
> lower work_mem soon.  But if it is business critical for those batch jobs to
> complete within some time frame, be careful.

Sorry, but that's backwards.  Unless the batch jobs are the only
important thing on this server, running it with work_mem=2G is asking
for trouble under any real kind of load.  It's sacrificing stability
for some unknown and quite probably minimal performance improvement.

It seems a lot of your post is based on either hoping for the best, or
assuming access patterns won't change much over time.  Do you test at
higher and higher parallel loads until failure occurs and then figure
out how to limit that type of failure?  I do, because I can't afford
to have my db servers crash and burn midday under peak load.  And you
never know when some app is gonna start suddenly spewing things like
unconstrained joins due to some bug, and if you've got work_mem set to
1G your server IS gonna have problems.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance