Writable snapshot hang, high I/O wait, almost no disk I/O

"Vitaliy Filippov" <vitalif@yourcmc.ru> · Thu, 16 Jan 2014 02:19:29 +0400

Hi!

I've just tried to experiment with dm snapshot target to test a tool I've  
recently written for ext4 filesystem (it resizes inode tables without  
recreating the fs). Underlying device is /dev/sdc1 on a SATA hard drive,  
COW device is a loop device connected to a sparse file located on an ext4  
/home filesystem.

At first there was no problem, the read/write performance was good. But  
then my tool started to move inode tables (~ 4.5 GB in total on a 3 TB  
partition) and hanged in 'D' state - I observe it at the moment, it  
already lasts for almost 2 hours :) 'sync' executed in another console  
also hangs. There are messages in dmesg:

[654255.339515] INFO: task sync:28318 blocked for more than 120 seconds.
[654255.339522]       Tainted: G        W    3.12-trunk-amd64 #1
[654255.339525] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"  
disables this message.
[654255.339528] sync            D ffff8800a7de8b00     0 28318  27636  
0x20020000
[654255.339534]  ffff8800a7de87c0 0000000000000082 00000000000142c0  
ffff88008c3e3fd8
[654255.339539]  00000000000142c0 ffff88008c3e3fd8 ffff88011fc14b00  
ffff88011ffce908
[654255.339543]  0000000000000002 ffffffff8110f530 ffff88008c3e3e20  
000000002baa1371
[654255.339548] Call Trace:
[654255.339558]  [<ffffffff8110f530>] ? wait_on_page_read+0x60/0x60
[654255.339564]  [<ffffffff81489194>] ? io_schedule+0x94/0x120
[654255.339568]  [<ffffffff8110f535>] ? sleep_on_page+0x5/0x10
[654255.339574]  [<ffffffff814871d4>] ? __wait_on_bit+0x54/0x80
[654255.339578]  [<ffffffff8110f34f>] ? wait_on_page_bit+0x7f/0x90
[654255.339583]  [<ffffffff8107a440>] ? wake_atomic_t_function+0x30/0x30
[654255.339589]  [<ffffffff8111bc28>] ? pagevec_lookup_tag+0x18/0x20
[654255.339593]  [<ffffffff8110f438>] ? filemap_fdatawait_range+0xd8/0x150
[654255.339598]  [<ffffffff8119bf30>] ? sync_inodes_one_sb+0x20/0x20
[654255.339603]  [<ffffffff811a62ce>] ? iterate_bdevs+0xbe/0xf0
[654255.339607]  [<ffffffff8119c199>] ? sys_sync+0x69/0x90

Interesting there are no such messages about my tool, although it's still  
in D state and I can't interrupt it using any signal.

At the same time the system overall is still responsive, and it's clearly  
seen that some work is being done - the place occupied by my sparse file  
which is the snapshot device grows very slowly - something about 1-2  
megabytes in a minute. There's massive I/O wait (~60% cpu), NO user (0%)  
and NO system (0%) CPU usage, and almost NO real disk input/output -  
'iostat 1' shows mostly zeroes, and only sometimes it shows some small  
values 'written' on a device with that sparse file.

'slabtop' shows very big, and roughly equal amounts of bio-0, bio-1, dm_io  
and buffer_head objects. kmalloc-32, dm_snap_pending_exception and  
kcopyd_job are less, but also close to 100%. Most of these numbers change  
slightly over time, except for kcopyd_job that is stuck at 77490. It looks  
like the following:

1311206 634877  48%    0,10K  35438       37    141752K buffer_head
621400 621304  99%    0,19K  31070       20    124280K bio-0
621184 621080  99%    0,04K   6752       92     27008K dm_io
621090 621080  99%    0,25K  41406       15    165624K bio-1
 88816  86981  97%    0,03K    793      112      3172K kmalloc-32
 77959  77889  99%    0,10K   2107       37      8428K  
dm_snap_pending_exception
 77490  77490 100%    3,23K  38745        2    309960K kcopyd_job

(full /proc/slabinfo is attached)

'dmsetup status' prints the following and the output does not change over  
time:
sdc1ovl: 0 5860531087 snapshot 806488/5860531087 728

I've googled for lvm snapshot performance problems and there are many  
threads about it, but as I understand something similar to my case was  
only in a 2005 thread here:
http://www.redhat.com/archives/dm-devel/2005-June/msg00126.html
And also maybe in another thread here:
http://www.redhat.com/archives/dm-devel/2006-September/msg00078.html

What do you think this problem could be? Thanks in advance to anyone who  
answers :-)

--
With best regards,
  Vitaliy Filippov
Attachment:
slabinfo

Description: Binary data
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/