Re: Advice on selecting good values for work_mem?

Bill Moran <wmoran@xxxxxxxxxxxxxxxxxxxxxxx> · Mon, 18 Dec 2006 16:43:05 -0500

In response to Stephen Frost <sfrost@xxxxxxxxxxx>:

> * Bill Moran (wmoran@xxxxxxxxxxxxxxxxxxxxxxx) wrote:
> > What I'm fuzzy on is how to discretely know when I'm overflowing
> > work_mem?  Obviously, if work_mem is exhausted by a particular
> > query, temp files will be created and performance will begin to suck,
> 
> I don't believe this is necessairly *always* the case.  There are
> instances in PostgreSQL where it will just continue to allocate memory
> beyond the work_mem setting.  This is usually due to poor statistics
> (you changed the data in the table dramatically and havn't run analyze,
> or you never ran analyze on the table at all, or the statistics
> gathering values are set too low to capture enough information about
> the data, etc).  It would nice if it was possible to have this detected
> and logged, or similar.  Additionally, work_mem isn't actually a
> per-query thing, aiui, it's more like a per-node in the planner thing.
> That is to say that if you have multiple sorts going on, or a sort and a
> hash, that *both* of those expect to be able to use up to work_mem
> amount of memory.

I'm aware of that.  It's one of the reasons I asked about monitoring its
usage.

I mean, if I could be sure that each process only used work_mem amount of
space, it would be pretty easy to run some calculations and go to
management and say, "these servers need X amount of RAM for optimal
performance ..."

As it is, I'm trying to find the most complex queries and estimate how
many joins and sorts there are and how much that's going to add up to.
It'd be nice to be able to crank up the debugging and have postgresql
say:
QUERY 0: total work_mem: aaaaaaaa bytes
 JOIN 0: xxxxx bytes
 JOIN 1: yyyyy bytes
 ...

Perhaps it's in there somewhere ... I haven't experimented with cranking
the logging up to maximum yet.  If it's missing, I'm hoping to have some
time to add it.  Adding debugging to PostgreSQL is a pretty easy way to
learn how the code fits together ...

> Also, another point you might want to consider how to handle is that
> work_mem has no bearing on libpq and I don't recall there being a way to
> constrain libpq's memory usage.  This has been an issue for me just
> today when a forgot a couple parameters to a join which caused a
> cartesean product result and ended up running the box out of memory.
> Sure, it's my fault, and unlikely to happen in an application, but it
> still sucks. :)  It also managed to run quickly enough that I didn't
> notice what was happening. :/  Of course, the server side didn't need
> much memory at all to generate that result.  Also, libpq stores
> everything in *it's* memory before passing it to the client.  An example
> scenario of this being kind of an issue is psql, you need double the
> memory size of a given result because the result is first completely
> grabbed and stored in libpq and then sent to your pager (eg: less) which
> then sucks it all into memory again.  In applications (and I guess psql,
> though I never think of it, and it'd be nice to have as a configurable
> option if it isn't already...) you can use cursors to limit the amount
> of memory libpq uses.

In our case, the database servers are always dedicated, and the application
side always runs on a different server.  This is both a blessing and a
curse: On the one hand, I don't have to worry about any client apps eating
up RAM on the DB server.  On the other hand, last week we found a place
where a query with lots of joins was missing a key WHERE clause, it was
pulling something like 10X the number of records it needed, then limiting
it further on the client side.  Optimizing this sort of thing is something
I enjoy.

> As these are new things (both the temp file creation logging and the
> work_mem overflow detection, I believe), this discussion is probably
> more appropriate for -hackers.

True.  It started out here because I wasn't sure that the stuff didn't
already exist, and was curious how others were doing it.

When I've had some more opportunity to investigate work_mem monitoring,
I'll start the discussion back up on -hackers.

> > That leads to my other question.  Assuming I've got lots of
> > connections (which I do), how can I determine if work_mem is too
> > high?  Do server processes allocated it even if they don't actually
> > use it?  Is the only way to find out to reduce it and see when it
> > starts to be a problem?  If so, that leads back to my first question:
> > how can I be sure whether temp files were created or not?
> 
> Yeah, look for swappiness...  It'd be nice to be able to get memory
> statistics on queries which have been run though...
> 
> > My goal is to set work_mem as small as is possible for the most
> > common queries, then force the developers to use "set work_mem to x"
> > to adjust it for big queries.
> 
> Sounds like an excellent plan.  Be careful though, work_mem settings can
> affect query plans and they may discover that if set high enough the
> planner will, for example, do a hashjoin which is much faster than
> sorting and merge-joining, but takes alot of memory...  They may say
> "hey, I like it being fast" but not consider what happens when alot of
> those queries run at once..

Well ... as long as those kinds of issues exist, I'll have a job ;)

-- 
Bill Moran
Collaborative Fusion Inc.