Re: xfs task blocked for more than 120 seconds

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 30 Jan 2012 12:05:30 +1100

On Mon, Jan 30, 2012 at 02:20:26AM +0200, Sami Liedes wrote:
> Hi!
> 
> Following the LWN coverage of LCA talk on XFS, I decided to give XFS a
> try again. So I decided to try XFS on my backup partition.
> 
> My setup is as follows:
> 
> * The relevant disk, containing both the ext4 partition
>   (/media/backup) and the xfs partition (/media/xfstest) are on the
>   same 2 TB disk, which is a Western Digital Caviar Green SATA3 drive
>   (SMART reports the device model to be WDC WD20EARX-00PASB0). The
>   spindle speed of the model is reported to be "closer to 5400 than
>   7200 rpm".

Ok, so your workload is seeking repeatedly between the inner and out
portions of the disk, so will be seek bound and quite slow.

> * The ext4 partition is roughly 1 TB in size and the xfs partition
>   roughly 800 GB
> 
> * I'm using LVM; the logical volumes are /dev/rootvg/backup and
>   /dev/rootvg/xfstest
> 
> * These LVs are both LUKS (dm-crypt) partitions, encrypted with
>   cbc-essiv:sha256
> 
> * The computer is a Core i7 2600 3.4 GHz with 4 cores and HT
>   (therefore shows as 8 cores) with 8 GiB main memory. AES-NI
>   instructions are supported and disk crypto generally (with ext4)
>   works at transparent speeds.

That's not to say that ext4 doesn't have long IO hold-offs - it just
doesn't trigger the hang-check code.

> 
> * I'm running Debian unstable amd64 with vanilla (non-Debian) 3.2.2
>   kernel
> 
> * I believe both the disk and the main memory are in good health;
>   particularly with ext4 there hasn't been any problems
> 
> * The backup data is generated by BackupPC. It generally consists of a
>   large pool of files named by sha-1 hashes (~105 GB of files), plus
>   lots of trees of hardlinks to these in a structure not unlike
>   typical home directories and / and /usr trees. I *think* the hard
>   links may be a red herring and I suspect the problem has something
>   to do with xfs--dm-crypt interaction, but I'm not sure.

It could very well be a dm-crypt/xfs interaction, though I don't
have enough information yet.

> So, to copy the contents of the backup partition -- BackupPC data --
> to the XFS drive, I did the following (the tar line is from shell
> history, the lines before it are from my memory and might be slightly
> wrong):
> 
> # cryptsetup luksFormat /dev/rootvg/xfstest
> # cryptsetup luksOpen /dev/rootvg/xfstest rootvg-xfstest_crypt
> # mkfs.xfs /dev/mapper/rootvg-xfstest_crypt
> # mount /dev/mapper/rootvg-xfstest_crypt /media/xfstest
> # cd /media/backup
> # tar cf - backuppc/ | (cd ../xfstest && tar xf -)
> 
> During this operation, I got the following two warning messages in
> dmesg:
> 
> ------------------------------------------------------------
> [    0.000000] Linux version 3.2.2 (sliedes@lh) (gcc version 4.6.2 (Debian 4.6.2-12) ) #1 SMP Thu Jan 26 03:51:41 EET 2012
> [...]
> [19419.407855] XFS (dm-7): Mounting Filesystem
> [19419.726935] XFS (dm-7): Ending clean mount
> [...]
> [35380.253189] INFO: task tar:3312 blocked for more than 120 seconds.
> [35380.253192] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [35380.253194] tar             D ffff880220adc678     0  3312   3311 0x00000000
> [35380.253198]  ffff880107a1d778 0000000000000082 0000000000000000 ffff8800c552be30
> [35380.253202]  ffff880220adc2f0 ffff880107a1dfd8 ffff880107a1dfd8 ffff880107a1dfd8
> [35380.253206]  ffff880223d01650 ffff880220adc2f0 0000000000000000 ffffffff00000000
> [35380.253210] Call Trace:
> [35380.253217]  [<ffffffff813ed17a>] schedule+0x3a/0x50
> [35380.253220]  [<ffffffff813ed6a5>] schedule_timeout+0x1c5/0x300
> [35380.253237]  [<ffffffffa0a85ea5>] ? _xfs_buf_find+0xb5/0x240 [xfs]
> [35380.253240]  [<ffffffff813ee257>] __down_common+0x95/0xe5
> [35380.253249]  [<ffffffffa0a85ebe>] ? _xfs_buf_find+0xce/0x240 [xfs]
> [35380.253252]  [<ffffffff813ee306>] __down+0x18/0x1a
> [35380.253255]  [<ffffffff8107214c>] down+0x3c/0x50
> [35380.253263]  [<ffffffffa0a85d24>] xfs_buf_lock+0x44/0x110 [xfs]
> [35380.253272]  [<ffffffffa0a85ebe>] _xfs_buf_find+0xce/0x240 [xfs]
> [35380.253280]  [<ffffffffa0a8629f>] xfs_buf_get+0x2f/0x1d0 [xfs]
> [35380.253289]  [<ffffffffa0a86b38>] xfs_buf_read+0x28/0x120 [xfs]
> [35380.253303]  [<ffffffffa0ae0f87>] xfs_trans_read_buf+0x187/0x470 [xfs]
> [35380.253316]  [<ffffffffa0ae1345>] ? xfs_trans_brelse+0xd5/0x160 [xfs]
> [35380.253329]  [<ffffffffa0ac9732>] xfs_imap_to_bp.isra.9+0x42/0x120 [xfs]
> [35380.253341]  [<ffffffffa0acd441>] xfs_iread+0x81/0x1c0 [xfs]

Waiting for IO completion reading an inode buffer.

> [36339.636779] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [36339.636782] tar             D ffff880220adc678     0  3312   3311 0x00000000
.....
> [36339.636838]  [<ffffffff8107214c>] down+0x3c/0x50
> [36339.636847]  [<ffffffffa0a85d24>] xfs_buf_lock+0x44/0x110 [xfs]
> [36339.636856]  [<ffffffffa0a85ebe>] _xfs_buf_find+0xce/0x240 [xfs]
> [36339.636865]  [<ffffffffa0a8629f>] xfs_buf_get+0x2f/0x1d0 [xfs]
> [36339.636874]  [<ffffffffa0a86b38>] xfs_buf_read+0x28/0x120 [xfs]
> [36339.636889]  [<ffffffffa0ae0f87>] xfs_trans_read_buf+0x187/0x470 [xfs]
> [36339.636901]  [<ffffffffa0abaf9e>] xfs_da_do_buf+0x24e/0x620 [xfs]
> [36339.636904]  [<ffffffff8107218d>] ? up+0x2d/0x50
> [36339.636916]  [<ffffffffa0abb3cc>] xfs_da_read_buf+0x1c/0x20 [xfs]

Waiting for IO completion reading a directory buffer.

This typically happens when the buffer is already under IO (read or
write).

> ------------------------------------------------------------
> 
> Afterwards when BackupPC was doing a backup, I got these additional
> warnings:
> 
> ------------------------------------------------------------
> [91863.941820] INFO: task kworker/0:5:19584 blocked for more than 120 seconds.
> [91863.941823] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [91863.941825] kworker/0:5     D ffff8802205699d8     0 19584      2 0x00000000
> [91863.941829]  ffff88005fe9bb70 0000000000000046 ffff88005fe9bb00 0000000000000086
> [91863.941833]  ffff880220569650 ffff88005fe9bfd8 ffff88005fe9bfd8 ffff88005fe9bfd8
> [91863.941837]  ffff88021e4c8000 ffff880220569650 ffff88005fe9bb80 ffffffff81044ecd
> [91863.941840] Call Trace:
> [91863.941847]  [<ffffffff81044ecd>] ? try_to_wake_up+0x22d/0x290
> [91863.941852]  [<ffffffff813ed17a>] schedule+0x3a/0x50
> [91863.941855]  [<ffffffff813ed6a5>] schedule_timeout+0x1c5/0x300
> [91863.941858]  [<ffffffff81115124>] ? kmem_cache_free+0x104/0x110
> [91863.941863]  [<ffffffff810d7172>] ? mempool_free_slab+0x12/0x20
> [91863.941866]  [<ffffffff813ee257>] __down_common+0x95/0xe5
> [91863.941886]  [<ffffffffa0ad4e07>] ? xfs_getsb+0x37/0x60 [xfs]
> [91863.941889]  [<ffffffff813ee306>] __down+0x18/0x1a
> [91863.941892]  [<ffffffff8107214c>] down+0x3c/0x50
> [91863.941902]  [<ffffffffa0a85d24>] xfs_buf_lock+0x44/0x110 [xfs]
> [91863.941915]  [<ffffffffa0ad4e07>] xfs_getsb+0x37/0x60 [xfs]
> [91863.941929]  [<ffffffffa0ae0d03>] xfs_trans_getsb+0x43/0x140 [xfs]
> [91863.941942]  [<ffffffffa0ad4b36>] xfs_mod_sb+0x46/0xc0 [xfs]
> [91863.941952]  [<ffffffffa0a96320>] ? xfs_sync_inode_attr+0xf0/0xf0 [xfs]
> [91863.941962]  [<ffffffffa0a8c29f>] xfs_fs_log_dummy+0x4f/0x90 [xfs]
> [91863.941975]  [<ffffffffa0adad03>] ? xfs_log_need_covered+0x83/0xb0 [xfs]
> [91863.941985]  [<ffffffffa0a963a4>] xfs_sync_worker+0x84/0x90 [xfs]

Ok, that implies that the problem is dm-crypt. xfs_sync_worker()
calls xfs_fs_log_dummy() once every 30s. The first time it dirties
the superblock, logs it, then pushes it out for I/O. 30s later we've
come back and the superblock is still under IO. That directly
implies that dm-crypt hasn't finished the IO yet.

> [92103.787573] INFO: task cp:18226 blocked for more than 120 seconds.
> [92103.787576] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [92103.787578] cp              D ffff8802078e19d8     0 18226  28931 0x00000000
....
> [92103.787640]  [<ffffffff813ee306>] __down+0x18/0x1a
> [92103.787643]  [<ffffffff8107214c>] down+0x3c/0x50
> [92103.787652]  [<ffffffffa0a85d24>] xfs_buf_lock+0x44/0x110 [xfs]
> [92103.787661]  [<ffffffffa0a85ebe>] _xfs_buf_find+0xce/0x240 [xfs]
> [92103.787670]  [<ffffffffa0a8629f>] xfs_buf_get+0x2f/0x1d0 [xfs]
> [92103.787678]  [<ffffffffa0a86b38>] xfs_buf_read+0x28/0x120 [xfs]
> [92103.787693]  [<ffffffffa0ae0f87>] xfs_trans_read_buf+0x187/0x470 [xfs]
> [92103.787706]  [<ffffffffa0abaf9e>] xfs_da_do_buf+0x24e/0x620 [xfs]

And that's another directory read waiting on IO completion.

It is definitely a possibility that dm-crypt is not keeping up with
the IO that XFS is sending it and the way XFS blocks waiting for it
to complete triggers the hang-check code. However, it is possible
that XFS is stalling due to long IO completion latencies. Do the
workloads actually complete, or does the system hang? Also, does the
IO to the disk appear to stop for long periods, or is the disk 100%
busy the whole time? If the disk goes idle, can you get a dump of
the stalled processes via "echo w > /proc/sysrq-trigger" and post
that?

Cheers,

Dave.

-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs