Re: Disk latency goes up during certaing pediods

German Becker <german.becker@xxxxxxxxx> · Wed, 31 Jul 2013 13:25:41 -0300

To all whom might be interested, I have an update on this.I run some tests on the old production DB which was Posgres 8.3 (and only one disk for everything), using pgreplay, running the same queries as the 9.1 server.
Here is the output of iostat for the 8.3 server:

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   230.00   22.00  106.00   308.00  2692.00    23.44     0.33    2.58   2.34  30.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   166.00    9.50   65.50   160.00  1708.00    24.91     0.29    3.07   2.47  18.50

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   236.50    7.50  118.50   120.00  2984.00    24.63     0.39    3.61   1.55  19.50

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   310.50    7.50  168.50   112.00  3832.00    22.41     0.44    2.50   0.94  16.50

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   321.50   22.50  184.00   320.00  4048.00    21.15     0.88    4.24   1.74  36.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   266.00    4.50  155.00    64.00  3356.00    21.44     0.29    1.72   0.88  14.00

Here is the output for 9.1:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   85.00     0.00   352.00     8.28     0.29    3.46    0.00    3.46   3.46  29.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.50    0.00   97.00     0.00   450.00     9.28     0.39    4.04    0.00    4.04   3.79  36.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   87.00     0.00   376.00     8.64     0.29    3.29    0.00    3.29   3.29  28.60

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.50    0.00   92.00     0.00   386.00     8.39     0.32    3.43    0.00    3.43   3.28  30.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   89.50     0.00   388.00     8.67     0.33    3.66    0.00    3.66   3.66  32.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00  104.50     0.00   432.00     8.27     0.38    3.62    0.00    3.62   3.62  37.80

(the columns are not the same, probably because of different Ubuntu versions)
What is notable is the following:

8.3 shows much less disk utilization even though the same disk is used for, for example the plain log
I think the main difference is average request size and w/s. In 9.1 the request sizes seem to be roughly half of those of 8.3, so the w/s are rougly double.

I think this might be because of the differnt setting in wal_level; in 8.3 we were using archive, (hot_standby was not available yet) and in 9.1 we are using hot_standby. Here is what the documentations says about this:

"It is thought that there is little measurable difference in performance between using hot_standby and archive levels, so feedback is welcome if any production impacts are noticeable."

Although, of course, this might be just the difference betwen Postgres releases.

Any thoughts or feedback is appreciated.
Cheers,

Germán Becker

On Tue, Jul 30, 2013 at 1:02 PM, bricklen <bricklen@xxxxxxxxx> wrote:

On Tue, Jul 30, 2013 at 8:35 AM, German Becker <german.becker@xxxxxxxxx> wrote:

256 was set some time when we were testing a differnt issue. I read that the only drawback is the amunt of time required for recovery, which was tested and it was like 10 seconds for the 256 segments, and higher values mean less disk usage.

Anyway all these parameters should affect the throughput to the data disks, not the WAL, Am I right?

checkpoint_completion_target is to help with "checkpoint smoothing", to reduce the spike in disk I/O when shared_buffers are written out. Depesz has a good article about that:  http://www.depesz.com/2010/11/03/checkpoint_completion_target/

Do your graphs show any correlation between number of WAL segments getting recycled, and disk I/O spikes? Are you logging checkpoints? If so, you could use the checkpoint times to compare against your I/O graphs. I am by no means an expert here, I'm just throwing out ideas (which might already have been suggested).