Btrfs High IO-Wait

Martin Mailand <martin@xxxxxxxxxxxx> · Sun, 09 Oct 2011 23:23:42 +0200

Hi,
I have high IO-Wait on the ods (ceph), the osd are running a v3.1-rc9 
kernel.
I also experience high IO-rates, around 500IO/s reported via iostat.

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    6.80     0.00    62.40 
18.35     0.04    5.29    0.00    5.29   5.29   3.60
sdb               0.00   249.80    0.40  669.60     1.60  4118.40 
12.30    87.47  130.56   15.00  130.63   1.01  67.40

In comparison, the same workload, but the osd uses ext4 as a backing fs.

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00   10.00     0.00   128.00 
25.60     0.03    3.40    0.00    3.40   3.40   3.40
sdb               0.00    27.80    0.00   48.20     0.00   318.40 
13.21     0.43    8.84    0.00    8.84   1.99   9.60

iodump shows similar results, where sdb is the data disk, sda7 the 
journal and sda5 the root.

btrfs

root@s-brick-003:~# echo 1 > /proc/sys/vm/block_dump
root@s-brick-003:~# while true; do sleep 1; dmesg -c; done | perl 
/usr/local/bin/iodump
^C# Caught SIGINT.
TASK                   PID      TOTAL       READ      WRITE      DIRTY 
DEVICES
btrfs-submit-0        8321      28040          0      28040          0 sdb
ceph-osd              8514        158          0        158          0 sda7
kswapd0                 46         81          0         81          0 sda1
bash                 10709         35         35          0          0 sda1
flush-8:0              962         12          0         12          0 sda5
kworker/0:1           8897          6          0          6          0 sdb
kworker/1:1          10354          3          0          3          0 sdb
kjournald              266          3          0          3          0 sda5
ceph-osd              8523          2          2          0          0 sda1
ceph-osd              8531          1          1          0          0 sda1
dmesg                10712          1          1          0          0 sda5

ext4

root@s-brick-002:~# echo 1 > /proc/sys/vm/block_dump
root@s-brick-002:~# while true; do sleep 1; dmesg -c; done | perl 
/usr/local/bin/iodump
^C# Caught SIGINT.
TASK                   PID      TOTAL       READ      WRITE      DIRTY 
DEVICES
ceph-osd              3115        847          0        847          0 sdb
jbd2/sdb-8            2897        784          0        784          0 sdb
ceph-osd              3112        728          0        728          0 
sda5, sdb
ceph-osd              3110        191          0        191          0 sda7
perl                  3628         13         13          0          0 sda5
flush-8:16            2901          8          0          8          0 sdb
kjournald              272          3          0          3          0 sda5
dmesg                 3630          1          1          0          0 sda5
sleep                 3629          1          1          0          0 sda5

I think that is the same problem as in 
http://marc.info/?l=ceph-devel&m=131158049117139&w=2

I also did a latencytop as Chris recommended in the above thread.

Best Regards,
 martin

Attachment:
latencytop.out_long_uptime.bz2

Description: application/bzip
Attachment:
latencytop.out_short_uptime.bz2

Description: application/bzip