Re: deadlock balance_dirty_pages() to be expected?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Fengguang,

On 10/07/2011 03:37 PM, Wu Fengguang wrote:
Hi Bernd,

On Fri, Oct 07, 2011 at 08:34:33PM +0800, Bernd Schubert wrote:
Hello,

while I'm working on the page cached mode in FhGFS (*) I noticed a
deadlock in balance_dirty_pages().

sysrq-w showed that it never started background write-out due to

if (bdi_nr_reclaimable>  bdi_thresh) {
	pages_written += writeback_inodes_wb(&bdi->wb,
					    (write_chunk);


and therefore also did not leave that loop with

	if (pages_written>= write_chunk)
   				break;	/* We've done our duty */


So my process stay in uninterruptible D-state forever.

If writeback_inodes_wb() is not triggered, the process should still be
able to proceed, presumably with longer delays, but never stuck forever.
That's because the flusher thread should still be cleaning the pages
in the background which will knock down the dirty pages and eventually
unthrottle the dirtier process.

Hmm, that does not seem to work:

1330 pts/0 D+ 0:13 dd if=/dev/zero of=/mnt/fhgfs/testfile bs=1M count=100

So the process is in D state ever since I wrote the first mail, just for 100MB writes. Even if it still would do something, it would be extremely slow. Sysrq-w then shows:

[ 6727.616976] SysRq : Show Blocked State
[ 6727.617575]   task                        PC stack   pid father
[ 6727.618252] dd              D 0000000000000000  3544  1330   1306 0x00000000
[ 6727.619002]  ffff88000ddfb9a8 0000000000000046 ffffffff81398627 0000000000000046
[ 6727.620157]  0000000000000000 ffff88000ddfa000 ffff88000ddfa000 ffff88000ddfbfd8
[ 6727.620466]  ffff88000ddfa010 ffff88000ddfa000 ffff88000ddfbfd8 ffff88000ddfa000
[ 6727.620466] Call Trace:
[ 6727.620466]  [<ffffffff81398627>] ? __schedule+0x697/0x7e0
[ 6727.620466]  [<ffffffff8109be70>] ? trace_hardirqs_on_caller+0x20/0x1b0
[ 6727.620466]  [<ffffffff8139884f>] schedule+0x3f/0x60
[ 6727.620466]  [<ffffffff81398c44>] schedule_timeout+0x164/0x2f0
[ 6727.620466]  [<ffffffff81070930>] ? lock_timer_base+0x70/0x70
[ 6727.620466]  [<ffffffff81397bc9>] io_schedule_timeout+0x69/0x90
[ 6727.620466]  [<ffffffff81109854>] balance_dirty_pages_ratelimited_nr+0x234/0x640
[ 6727.620466]  [<ffffffff8110070f>] ? iov_iter_copy_from_user_atomic+0xaf/0x180
[ 6727.620466]  [<ffffffff811009ae>] generic_file_buffered_write+0x1ce/0x270
[ 6727.620466]  [<ffffffff811015dc>] ? generic_file_aio_write+0x5c/0xf0
[ 6727.620466]  [<ffffffff81101358>] __generic_file_aio_write+0x238/0x460
[ 6727.620466]  [<ffffffff811015dc>] ? generic_file_aio_write+0x5c/0xf0
[ 6727.620466]  [<ffffffff811015f8>] generic_file_aio_write+0x78/0xf0
[ 6727.620466]  [<ffffffffa034f539>] FhgfsOps_aio_write+0xdc/0x144 [fhgfs]
[ 6727.620466]  [<ffffffff8115af8a>] do_sync_write+0xda/0x120
[ 6727.620466]  [<ffffffff8112146c>] ? might_fault+0x9c/0xb0
[ 6727.620466]  [<ffffffff8115b4b8>] vfs_write+0xc8/0x180
[ 6727.620466]  [<ffffffff8115b661>] sys_write+0x51/0x90
[ 6727.620466]  [<ffffffff813a3702>] system_call_fastpath+0x16/0x1b
[ 6727.620466] Sched Debug Version: v0.10, 3.1.0-rc9+ #47



Once I added basic inode->i_data.backing_dev_info bdi support to our
file system, the deadlock did not happen anymore.

What's the workload and change exactly?

I wish I could simply send the patch, but until all the paper work is done I'm not allowed to :(

The basic idea is:

1) During mount and setting the super block from

static struct file_system_type fhgfs_fs_type =
{
	.mount = fhgfs_mount,
}

Then in fhgfs_mount():

bdi_setup_and_register(&sbInfo->bdi, "fhgfs", BDI_CAP_MAP_COPY);
sb->s_bdi = &sbInfo->bdi;



2) When new (S_IFREG) inodes are allocated, for example from

static struct inode_operations fhgfs_dir_inode_ops
{
	.lookup,
	.create,
	.link
}

inode->i_data.backing_dev_info = &sbInfo->bdi;



So my question is simply if we should expect this deadlock, if the file
system does not set up backing device information and if so, shouldn't
this be documented?

Such deadlock is not expected..

Ok thanks, then we should figure out why it happens. Due to a network outage here I won't have time before Monday to track down which kernel version introduced it, though.


Thanks,
Bernd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux