Hi Merlin,
Thanks for the input. Please see below ...
Merlin Moncure wrote:
On 7/5/06, andy rost <andy.rost@xxxxxxxx> wrote:
fsync = on # turns forced synchronization
have you tried turning this off and measuring performance?
No, not yet. We're trying a couple of outher avenues before manipulating
this parameter.
stats_command_string = on
I would turn this off unless you absoltely require it. It is
expensive for what it does.
We've turned this off
a) All 4 CPUs are nearly always 0% idle;
b) The system load level is nearly always in excess of 20;
I am guessing your system is spending all it's time syncing. If so,
it's solvable (again, just run fsync=off for a bit and compare).
We've reduced the load significantly primarily by moving pg_xlog to its
own drive and by increasing the effective cache size. While we still see
high load levels, they don't last very long. We're trying improve
performance from several angles but are taking it one step at a time.
Eventually we'll experiment with fsynch
c) the output from vmstat -w 10 looks like:
procs memory page disks faults
cpu
r b w avm fre flt re pi po fr sr aa0 aa1 in sy cs us
sy id
21 0 3 1242976 327936 2766 0 0 0 2264 0 2 2 17397 140332
104846 18 82 1
is that 100k context switches over 10 seconds or one second? that
might be something to check out. pg 8.1 is regarded as the solution
to any cs problem, though.
According to man top, that's 100K per second. I'm interested in your
recommendation but am not sure what "pg 8.1" references
NOTE - small user demands and high system demands
d) Running top indicates a significant number or sblock states and
occasional smwai states;
e) ps auxww | grep postgres doesn't show anything abnormal;
f) ESQL applications are very slow.
We VACUUM ANALYZE user databases every four hours. We VACUUM template1
every 4 hours. We make a copy of the current WAL every minute. We create
a PIT recovery archive daily daily. None of these, individually seem to
place much strain on the server.
your server should be able to handle this easily.
Hopefully I've supplied enough information to start diagnosing the
problem. Any ideas, thoughts, suggestions are greatly appreciated ...
can you please approximate roughly how many transactions per second
your server is handling while you are getting the 20 load condition
(and, if possible, broken down into read and write transactions)?
Do you have any suggestions on how I might obtain these metrics?
merlin
Thanks again Merlin ...
Andy
--
--------------------------------------------------------------------------------
Andrew Rost
National Operational Hydrologic Remote Sensing Center (NOHRSC)
National Weather Service, NOAA
1735 Lake Dr. West, Chanhassen, MN 55317-8582
Voice: (952)361-6610 x 234
Fax: (952)361-6634
andy.rost@xxxxxxxx
http://www.nohrsc.noaa.gov
--------------------------------------------------------------------------------