external sort performance

Jon Nelson <jnelson+pgsql@xxxxxxxxxxx> · Thu, 17 Nov 2011 11:10:56 -0600

I have one query which does not run very often. Sometimes it may be
months between runs.
However, when it does get executed, it scans approximately 100
identically-structured tables (a form of partitioning), extracts and
groups on a subset of the columns, and creates a new table. The
individual table queries have no where clauses, this is a full table
scan for every table.

I've tried all sorts of things to try to improve the performance,
which can take a /very/ long time.
We are talking about approximately 175GB of data before grouping/summarizing.

This is on PG 8.4.8 on Linux, 16GB of "real" RAM.
Most recently, I enabled trace_sort, disabled hash aggregation[1], and
set a large work_mem (normally very small, in this case I tried
anything from 8MB to 256MB. I even tried 1GB and 2GB).

In the logs, I saw this:

external sort ended, 7708696 disk blocks used: CPU 359.84s/57504.66u
sec elapsed 58966.76 sec

Am I to understand that the CPU portion of the sorting only took 6
minutes but the sort itself took almost 16.5 hours and used approx
60GB of disk space?
The resulting summary table is about 5GB in size as reported by \d+ in
psql (and pg_relation_size).

The underlying storage is ext4 on a hardware raid 10 with a BBU.

What sorts of things should I be looking at to improve the performance
of this query? Is my interpretation of that log line totally off base?

[1] if I don't disable hash aggregation and the work_mem is over 8MB
in size, the memory allocation explodes to the point where postgresql
wants dozens of gigs of memory. I've tried setting the statistics as
high as 1000 without benefit.

-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance