Re: Sort causes system to freeze

Richard Huxton <dev@xxxxxxxxxxxx> · Tue, 02 Dec 2008 09:59:04 +0000

Don't reply to another message when starting a new thread. People will
miss your message.

Craig James wrote:
> Maybe this is an obviously dumb thing to do, but it looked reasonable to
> me.

Looks reasonable here too - except I'm not sure what I'd do with 2
million rows of sorted table in my console. I'm guessing you're piping
the output into something.

>  The problem is, the seemingly simple sort below causes a fairly
> powerful computer to completely freeze for 5-10 minutes.  During the
> sort, you can't login, you can't use any shell sessions you already have
> open, the Apache server barely works, and even if you do "nice -20 top"
> before you start the sort, the top(1) command comes to a halt while the
> sort is proceeding!  As nearly as I can tell, the sort operation is
> causing a swap storm of some sort -- nothing else in my many years of
> UNIX/Linux experience can cause a "nice -20" process to freeze.

Nothing should cause that to your machine. I've never seen "top" just
freeze unless you set up some sort of fork-bomb and ramp the load up so
fast it can't cope. Oh, and nice-ing the client isn't going to do
anything to the backend actually doing the sorting.

> The sort operation never finishes -- it's always killed by the system. 
> Once it dies, everything returns to normal.

You're running out of memory then. It'll be the out-of-memory killer
(assuming you're on Linux).

> This is 8.3.0.  (Yes, I'll upgrade soon.)

Make "soon" more urgent than it has been up to now - no point in risking
all your data to some already fixed bug is there? Unless you've been
carefully tracking the release notes and have established that there's
no need in your precise scenario.

> Is this a known bug, or do I
> have to rewrite this query somehow?  Maybe add indexes to all four
> columns being sorted?

Indexes won't necessarily help if you're sorting the whole table. Maybe
if you had one on all four columns.

> => explain select * from plus order by supplier_id, compound_id, units,
> price;

> max_connections = 1000
> shared_buffers = 2000MB
> work_mem = 256MB

So can you support (1000 * 256 * 2) + 2000 MB of RAM?

> effective_cache_size = 4GB

...while leaving 4GB free for disk caching?

> Machine: Dell, 8x64-bit CPUs, 8GB ram, Perc6i battery-backed RAID
> controller, 8 disks as RAID10

It appears not. Remember that work_mem is not only per-connection, a
single query can use multiples of it (hence the *2 above). If you
genuinely have a lot of connections I'd drop it down to (say) 4MB to
make sure you don't swap on a regular basis (should probably be even
lower to be truly safe).

Then, for the odd case when you need a large value, issue a SET work_mem
before the query.

-- 
  Richard Huxton
  Archonet Ltd

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance