Additionally, the fact that you haven’t had swap storm issues so far means that although there is certain risk of an issue, its probably a lot lower than what has been talked about here so far.
Without a change in client behavior (new queries or large change in data) a change in load alone is very unlikely to cause a problem. So take your time to do it right. I disagree with the knee-jerk “change it now!” response. The very fact you have gotten this far means it is not as risky as the bare settings indicate.
Definitely plan on improving and testing out work_mem variants soon, but a hasty change to a small value might cause your batch jobs to take much longer — what is your risk if they take too long or don’t complete in time? That risk is yours to assess — if its not much of a risk, then by all means lower work_mem soon. But if it is business critical for those batch jobs to complete within some time frame, be careful.
If you know what you are doing, and are careful, your work_mem is probably, but not necessarily too high.
It all depends on how much you know about your clients.
For example 2010 connections with 500MB work_mem is not always a problem. If you know 2000 of those are from an application that accesses with a user that can only see small tables, and you know what queries those are, it may be perfectly safe. For example, I’ve dealt with an application that had a couple thousand connections, 95% were idle at any time (connection pool much like those likely from your Jboss apps). The queries these ran were a limited set of about 20 statements that all accessed by unique key on small-ish sized tables (<30MB) with few joins. There were tons of connections, but they:
1: hardly did anything, most were idle. On 75% of the connections, a query set was run exactly once every 15 minutes selecting * from small (sub 1MB) tables.
2: the more active connections (20%) did small select queries on single rows accessed by primary key.
So, the calculation max connections * work_mem is utterly inappropriate for that sort of workload. Yes, in theory, those connections could use work_mem * some factor of memory — if they changed their queries, and accessed other tables. In practice — nowhere close.
The remaining few connections(~5) were batch jobs that needed ~800MB of work_mem or else the performance would stink. And they didn’t need 800MB of work_mem for real (the hashes used ~250MB) they needed a SETTING of 800MB because the planner is incapable of estimating row counts properly with partitioned table access.
Both applications were not able to configure their own work_mem for quite some time (getting client applications to change is not always a quick process).
But the risk of having a large setting, even with 2000 connections was low. The risk of changing it too low was very high (batch jobs taking 3 hours instead of 10 minutes). Only 5 ish connections even accessed schemas/tables with lots of data. The remaining couple thousand were constrained in many ways other than work_mem.
In the end I did have swap storms... However it was not caused by work_mem. It was the query planner, which can use GBs of memory per connection planning a query on partitioned tables.
So, my point is that if you don’t know a lot about the database or its clients be very wary of large work_mem settings. If you do, and have a lot of control or knowledge about your clients, the work_mem * max_connections calculation is inappropriate.
The detailed formula is along the lines of:
SUM_i [work_mem_i * active_connecions_i] (for each ‘type’ of connection i).
If you don’t know enough about your connections, then the conservative estimate is work_mem * max_connections.
A single query has the potential of using multiples of work_mem depending on how many concurrent hashes / sorts are in a query, so the above is not quite right either.
Is there a way to make a particular database user have a user-local work_mem setting without having the client change their code? You could then have each application have its own user, with its own default setting. The batch jobs with few connections can get much larger work_mem than the Jboss ones. This would be especially powerful for applications that can’t change or that use higher level tools for db access that make it impossible or very difficult to send non-standard commands like “SET”.
On 3/4/09 4:46 AM, "Flavio Henrique Araque Gurgel" <flavio@xxxxxxxxxxxxx> wrote:
----- "Scott Marlowe" <scott.marlowe@xxxxxxxxx> escreveu:
> Oh my lord, that is a foot gun waiting to go off. Assuming 2k
> connections, and somehow a fair number of them went active with big
> sorts, you'd be able to exhaust all physical memory with about 8 to
> 16 connections. Lower work_mem now. To something like 1 to 4 Meg. Do
> not pass go. If some oddball query really needs a lot of work_mem,
> and benchmarks show something larger work_mem helps, consider raising
> the work_mem setting for that one query to something under 1G (way
> under 1G) That makes it noticeably faster. Don't allocate more than a
> test shows you helps.
Thanks a lot Scott. That's what I thought in the beginning but was very doubtful since the documentation is a bit odd regarding this point and several bloggers talk about increasing this value up to 250MB. I really think that separating regular non pooled distributed applications and pooled application servers makes a lot of difference in this point.
> Vacuum does not block transactions. unless you're dropping tables or something.
I'll try to separate things and check if the DELETE queries have something related here.
(...)
> what you think they mean. Post the first 20 or so lines from top to
> show us.
Unfortunately I can't do it. The data there is very sensitive (it's a public company here in Brazil) and the server is operated only by selected personal. I just ask for information and give written recomendations. Anyway, I'm going to pay some more attention in this topic.
This is a very interesting implementation of PostgreSQL (3 large databases, heavy load, things growing all the time) and I'll let you all know what happened when tuning it. I'll feedback you after lowering work_mem and changing related settings.
Thanks
Flavio