On Mon, Nov 14, 2011 at 2:57 PM, Tomas Vondra <tv@xxxxxxxx> wrote: > On 14 Listopad 2011, 22:58, Cody Caughlan wrote: >>> Seems reasonable, although I'd bump up the checkpoint_timeout (the 5m is >>> usually too low). >> >> Ok, will do. > > Yes, but find out what that means and think about the possible impact > first. It usually improves the checkpoint behaviour but increases the > recovery time and you may need more checkpoint segments. And I doubt this > will fix the issue you've described. > Ok, I understand the ramifications with increasing the checkpoint timeout. But I will investigate more before I jump in. >> I ran bonnie++ on a slave node, doing active streaming replication but >> otherwise idle: >> http://batch-files-test.s3.amazonaws.com/sql03.prod.html >> >> bonnie++ on the master node: >> http://batch-files-test.s3.amazonaws.com/sql01.prod.html >> >> If I am reading this right, this is my first time using it, the >> numbers dont look too good. > > Are those instances equal, i.e. use tha same RAID10 config etc.? It > surprises me a bit that the slave performs much better than the master, > for example the sequential reads are much faster (210MB/s vs. 60MB/s) and > it handles about twice the number of seeks (345 vs. 170). But this may be > skewed because of the workload. Yes, these two machines are the same. sql01 is the master node and is quite busy. Running bonnie++ on it during its normal workload spiked I/O for the duration. sql03 is a pure slave and is quite idle, save for receiving WAL segments. > >> Heres a gist of running "iostat -x 3" for about a few minutes: >> >> https://gist.github.com/f94d98f2ef498a522ac2 >> >> Indeed, the %iowat and await values can spike up drastically. > > OK, so xvdb-xvde are individual drives and dm-0 is the RAID10 device, > right? According to the log_checkpoint info, you're writing about 15000 > (120MB) buffers in 270s, i.e. about 440kB/s. But according to the iostat > you're writing up to 4MB/s, so it's not just about the checkpoints. > > What else is going on there? How much WAL do you write? Yes, dm-0 is the RAID10 device. The WAL config is: wal_buffers = 16MB checkpoint_segments = 32 max_wal_senders = 10 checkpoint_completion_target = 0.9 checkpoint_timeout = 300 wal_keep_segments = 1024 > > Do you have iotop installed? That might give you a hint what processes are > writing data etc. I do have iotop and have been watching it. The only I/O users are postgres and its backends. I dont see anything else consuming any I/O. By eyeballing iotop, big consumers of disk writes are: idle in transaction, SELECT, COMMIT The first two are what I would think would be largely read operations (certainly the SELECT) so its not clear why a SELECT consumes write time. Here is the output of some pg_stat_bgwriter stats from the last couple of hours: https://gist.github.com/41ee26caca01471a9b77 One thing that I might not have made very clear earlier is that this DB, especially a single table receives a very large number of UPDATEs. However, it seems to be well cached, I have shared_buffers = 3584MB and a view of pg_buffercache shows: https://gist.github.com/53c520571290cae14613 Is it possible that we're just trying to handle too many UPDATEs and they are all trying to hit disk all at once - causing this I/O contention? Here is a view of pg_stat_user_tables that shows the amount of live/dead tuples: https://gist.github.com/5ac1ae7d11facd72913f -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance