Hi, On 5.8.2013 17:55, Tasos Petalas wrote: > > Seems most of the I/O is caused by SELECT backend processes (READ), > whereas (WRITE) requests of wal writer and checkpointer processes do > not appear as top IO proceses (correct me if I am wrong) > > E.g. check the follwoing heavy write process that reports 0% I/O ...! > > 14:09:40 769 be/4 enterpri 0.00 B/s 33.65 M/s 0.00 % 0.00 % > postgres: wal writer process That's because the WAL writer does sequential I/O (writes), which is a perfect match for SAS drives. OTOH the queries do a lot of random reads, which is a terrible match for spinners. > That however still doesn't say which processes are responsible > for that. > Is that background writer, backends running queries or what? The > iotop > should give you answer to this (or at least a hint). > > > It seems most of I/O reported from backends running heavy concurrent > select queries (See iotop attachment in previous email) Yes, that seems to be the case. > Also, how are these volumes defined? Do they use distinct sets > of disks? > How many disks are used for each volume? > > > These are LUNs from SAN (we have dedicated 16 SAS 2,5'' disks in RAID-10 > topology in Storage) I do understand these are LUNs from the SAN. I was asking whether there are separate sets of disks for the data directory (which you mentioned to be RAID-10) and pg_archives (which you mentioned to be RAID-5). Although I doubt it'd be possible to use the same disk for two LUNs. > > Yes we are using 15K SAS disks in RAID 10. (253-2 dev refers > to sar > > output for disks) > > OK, so the pg_archives is probably for xlog archive, right? > > NO. > /pg_archives is the target mount_point where we copy archive_logs to > (archive_command = 'test ! -f /pg_archives/%f && cp %p /pg_archives/%f') ... which is exactly what WAL archive is. That's why the GUC is called archive_command. > I've checked the conf, and I think you should really consider > increasing > checkpoint_segments - it's set to 3 (= 64MB) but I think > something like > 32 (=512MB) or even more would be more appropriate. > > We use EDB dynatune. Actual setting can be found in file > (Ticket.Usual.Info.27.07.13.txt) of initial e-mail --> check show all; > section > Current checkpoint_segments is set to 64 OK, I'm not familiar with dynatune, and I got confused by the postgresql.conf that you sent. 64 seems fine to me. > I see you've enabled log_checkpoints - can you check your logs > how often > the checkpoints happen? > > > This is the output of the checkpoints during peak hours (avg. every 2-5 > minutes) > > 2013-08-02 14:00:20 UTC [767]: [19752]: [0]LOG: checkpoint complete: > wrote 55926 buffers (5.3%); 0 transaction log file(s) added, 0 removed, > 41 recycled; write=220.619 s, sync= > 5.443 s, total=226.152 s; sync files=220, longest=1.433 s, average=0.024 s > 2013-08-02 14:05:14 UTC [767]: [19754]: [0]LOG: checkpoint complete: > wrote 109628 buffers (10.5%); 0 transaction log file(s) added, 0 > removed, 31 recycled; write=209.714 s, syn > c=9.513 s, total=219.252 s; sync files=222, longest=3.472 s, average=0.042 s Meh, seems OK to me. This was based on the incorrect number of checkpoint segments ... > > > > Also, can you check pg_stat_bgwriter view? I'd bet the value in > checkpoints_timed is very low, compared to checkpoints_req. Or even > better, get the values from this view before / after running the > batch jobs. > > Results during load: > checkpoints_timed : 12432 , checkpoints_req = 3058 Again, seems fine. > In the afternoon it's a different story - for 253-2 it looks > like this: > > DEV tps rd_sec/s wr_sec/s await %util > 15:50:01 dev253-2 4742.91 33828.98 29156.17 84.84 105.14 > 16:00:01 dev253-2 2781.05 12737.41 18878.52 19.24 80.53 > 16:10:01 dev253-2 3661.51 20950.64 23758.96 36.86 99.03 > 16:20:01 dev253-2 5011.45 32454.33 31895.05 72.75 102.38 > 16:30:01 dev253-2 2638.08 14661.23 17853.16 25.24 75.64 > 16:40:01 dev253-2 1988.95 5764.73 14190.12 45.05 58.80 > 16:50:01 dev253-2 2185.15 88296.73 11806.38 7.46 84.37 > 17:00:01 dev253-2 2031.19 12835.56 12997.34 8.90 82.62 > 17:10:01 dev253-2 4009.24 34288.71 23974.92 38.07 103.01 > 17:20:01 dev253-2 3605.86 26107.83 22457.41 45.76 90.90 > 17:30:01 dev253-2 2550.47 7496.85 18267.07 19.10 65.87 > > > This is when the actual problem arises Well, then I think it's mostly about the SELECT queries. > What I think you could/should do: > > * move pg_xlog to a separate device (not simply a volume on the SAN, > sharing disks with the other volumes - that won't give you > anything) > > Unfortunately we cannot do so at the moment (alll available SAN > resources are assigned to the pg_data directory of the server) > > I'd expect these changes to improve the afternoon peak, as it's > doing > about 50% writes. However I would not expect this to improve the > morning > peak, because that's doing a lot of reads (not writes). > > Afternoon peak is what we need to troubleshoot (will check if we can > assign pg_xlog to a different LUN - not an option currently) OK, understood. It's difficult to predict the gain and given the iotop output it might even cause harm. > > Will SSD improve write performance? We are thinking of moving towards > this direction. It'll certainly improve the random I/O in general, which is the main issue with SELECT queries. Sequential read/write improvement probably won't be that significant. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance