You should turn on log_checkpoint in the postgresql.conf and confirm the slowdowns are happening around the same time as the checkpoint report gets written to the log files.
Some more details about my current config (almost no activity since boot except the one large update still in progress):
select * from pg_stat_bgwriter;
checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc
-------------------+-----------------+--------------------+---------------+------------------+-----------------+---------------
440 | 51 | 39248820 | 8876042 | 85527 | 3952243 | 48756188
and a couple of hours later:
465 | 56 | 43316686 | 9855720 | 95058 | 3997393 | 53713342
My analysis:
checkpoints_timed increments roughly every 5 minutes, and checkpoints_req increments roughly every 30 minutes.
Around 80% of the checkpoints are timed, rather than requested. But the pauses/freezes are of the order of 30 to 60 seconds apart (roughly aligned with similar slow fsync calls), not 5 minutes apart. There are multiple freezes per checkpoint.
During each checkpoint, 83K * 8K blocks were written out, or 650MB on average per checkpoint. Around 20% were cleaned by the background cleaner. if this occured at anywhere near my drive's average random write speed it should take less than 20 seconds even if 100% random disc blocks.
I don't understand how maxwritten_clean could be as high as 95058, and increment at more than 1 per second. This is a process count, not a buffer count? How often is the background cleaner launched? Does that mean I need to massively increase bgwriter_lru_maxpages, and other bgwriter params? They are currently default values.
On Wed, Dec 9, 2009 at 12:03 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
fsyncs mostly happen as a result of checkpoint activity. I wonderwhether you have done any tuning of checkpoint intervals,
shared_buffers, etc.
I wouldn't describe the values as "tuned" ... more like set and forget ...
postgresql.conf (currently):
max_connections = 50
shared_buffers = 4GB
work_mem = 64MB
maintenance_work_mem = 1GB
synchronous_commit = 'off'
wal_buffers = 8MB
checkpoint_segments = 128
autovacuum = 'on'
effective_cache_size = 12GB
default_statistics_target = 100
constraint_exclusion = 'on'
shared_buffers is 1/4 of RAM, and effective_cache_size is 3/4 RAM. The system is not swapping, and currently 9.5GB of RAM is marked as "Inactive".
checkpoint_segments is large, but in the suggested range of 32 to 256 for large, write-heavy, batch-oriented systems. I've used smaller values in the past, but I got lots of checkpoint warnings in the console.
wal_buffers might be a bit high. I don't fully understand the implications, and how to tune this value, even after reading the tuning guides.
On Wed, Dec 9, 2009 at 4:40 PM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:
Latest firmware for the 50nm version. I haven't found any known issue.
During the pauses, "Activity Monitor" reports no disk activity. But I am not sure if this implies no SATA traffic.
Make them more frequent but smaller, or less frequent but bigger? Which way has less penalty?
I'm very tempted to live dangerously. I would like to actually turn off fsync's during a transaction, and just use the fsync at the very end of the transaction to confirm that the transaction will be permanent. Or maybe use the filesystem journaling function to ensure I can roll-back to a valid state if something bad happens mid-transaction. But these do not seem to be options.
Unfortunately, I have a lot of foreign keys with "ON DELETE CASCADE", and various other triggers. While that greatly simplified my housekeeping code, I can no longer do delete & reinsert.
I can break the big transactions into hundreds or thousands of little transactions. This does let me stop/restart the system at will, but I've yet to see it help my throughput.
Maybe I've overlooked something, but nothing looks too wrong with postgresql.conf to me, based on my reading of the tuning guides. One checkpoint write every 5 minutes (80% timed) doesn't seem too bad. Something else (probably the very slow fsync) is keeping the checkpoint writing rate low, and there appear to be a lot more fsync's than checkpoint segments.
Stephen
On Wed, Dec 9, 2009 at 4:40 PM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:
possible resolutions might include:
*) you might luck out and be dealing with known hopefully corrected
issue with intel drives
Latest firmware for the 50nm version. I haven't found any known issue.
During the pauses, "Activity Monitor" reports no disk activity. But I am not sure if this implies no SATA traffic.
*) tweak checkpoints/bgwriter to try and reduce the penalty of fsync
Make them more frequent but smaller, or less frequent but bigger? Which way has less penalty?
*) disable fsync...risky, but I bet will make the problem go away
I'm very tempted to live dangerously. I would like to actually turn off fsync's during a transaction, and just use the fsync at the very end of the transaction to confirm that the transaction will be permanent. Or maybe use the filesystem journaling function to ensure I can roll-back to a valid state if something bad happens mid-transaction. But these do not seem to be options.
*) optimize your query. maybe try rewriting as delete/insert select
from scratch table?
Unfortunately, I have a lot of foreign keys with "ON DELETE CASCADE", and various other triggers. While that greatly simplified my housekeeping code, I can no longer do delete & reinsert.
I can break the big transactions into hundreds or thousands of little transactions. This does let me stop/restart the system at will, but I've yet to see it help my throughput.
Maybe I've overlooked something, but nothing looks too wrong with postgresql.conf to me, based on my reading of the tuning guides. One checkpoint write every 5 minutes (80% timed) doesn't seem too bad. Something else (probably the very slow fsync) is keeping the checkpoint writing rate low, and there appear to be a lot more fsync's than checkpoint segments.
Stephen