NeilBrown <neilb@xxxxxxx> writes: >> Synchronous notification of errors. If we don't try to write everything >> back immediately after the size change, we don't see dirty pages in >> zapped regions until the writeout/page cache management takes it into >> its head to try to clean the pages. >> > > So if you just want synchronous errors, I think you want: > fsync_bdev() > > which calls sync_filesystem() if it can find a filesystem, else > sync_blockdev(); (sync_filesystem itself calls sync_blockdev too). ... which deadlocks md. ;-) writeback_inodes_sb_nr is waiting for the flusher thread to write back the dirty data. The flusher thread is stuck in md_write_start, here: wait_event(mddev->sb_wait, !test_bit(MD_CHANGE_PENDING, &mddev->flags)); This is after reverting your change, and replacing the flush_disk call in check_disk_size_change with a call to fsync_bdev. I'm not familiar enough with md to really suggest a way forward. Neil? Cheers, Jeff md127: detected capacity change from 267386880 to 401080320 INFO: task md127_raid5:2255 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. md127_raid5 D ffff88011d010690 5416 2255 2 0x00000080 ffff88010fcc7990 0000000000000046 ffff880100000000 ffffffff812070c9 0000000000014d00 ffff88011d010100 ffff88011d010690 ffff88010fcc7fd8 ffff88011d010698 0000000000014d00 ffff88010fcc6010 0000000000014d00 Call Trace: [<ffffffff812070c9>] ? cpumask_next_and+0x29/0x50 [<ffffffff81493df5>] schedule_timeout+0x265/0x2d0 [<ffffffff8104b341>] ? enqueue_task+0x61/0x80 [<ffffffff81493a25>] wait_for_common+0x115/0x180 [<ffffffff81057850>] ? default_wake_function+0x0/0x10 [<ffffffff81493b38>] wait_for_completion+0x18/0x20 [<ffffffff8115cce2>] writeback_inodes_sb_nr+0x72/0xa0 [<ffffffff8115cfad>] writeback_inodes_sb+0x4d/0x60 [<ffffffff81162499>] __sync_filesystem+0x49/0x90 [<ffffffff81162592>] sync_filesystem+0x32/0x60 [<ffffffff8116bc59>] fsync_bdev+0x29/0x70 [<ffffffff8116bcea>] check_disk_size_change+0x4a/0xb0 [<ffffffff81208e27>] ? kobject_put+0x27/0x60 [<ffffffff8116bdaf>] revalidate_disk+0x5f/0x90 [<ffffffffa031155a>] raid5_finish_reshape+0x9a/0x1e0 [raid456] [<ffffffff8138a933>] reap_sync_thread+0x63/0x130 [<ffffffff8138c8a6>] md_check_recovery+0x1f6/0x6f0 [<ffffffffa03150ab>] raid5d+0x3b/0x610 [raid456] [<ffffffff810804c9>] ? prepare_to_wait+0x59/0x90 [<ffffffff81387ee9>] md_thread+0x119/0x150 [<ffffffff810801d0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff81387dd0>] ? md_thread+0x0/0x150 [<ffffffff8107fb56>] kthread+0x96/0xa0 [<ffffffff8100cc04>] kernel_thread_helper+0x4/0x10 [<ffffffff8107fac0>] ? kthread+0x0/0xa0 [<ffffffff8100cc00>] ? kernel_thread_helper+0x0/0x10 INFO: task flush-9:127:2288 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. flush-9:127 D ffff88011cee30a0 4664 2288 2 0x00000080 ffff88011b0af6a0 0000000000000046 0000000000000000 0000000000000000 0000000000014d00 ffff88011cee2b10 ffff88011cee30a0 ffff88011b0affd8 ffff88011cee30a8 0000000000014d00 ffff88011b0ae010 0000000000014d00 Call Trace: [<ffffffff8138bbb5>] md_write_start+0xa5/0x1c0 [<ffffffff810801d0>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa0316435>] make_request+0x45/0x6c0 [raid456] [<ffffffff811fbfcb>] ? blkiocg_update_dispatch_stats+0x8b/0xd0 [<ffffffff81385ca3>] md_make_request+0xd3/0x210 [<ffffffff811ee9da>] generic_make_request+0x2ea/0x5d0 [<ffffffff810e9cde>] ? mempool_alloc+0x5e/0x140 [<ffffffff811eed41>] submit_bio+0x81/0x110 [<ffffffff811699c6>] ? bio_alloc_bioset+0x56/0xf0 [<ffffffff81163ef6>] submit_bh+0xe6/0x140 [<ffffffff81165ad0>] __block_write_full_page+0x200/0x390 [<ffffffff811655a0>] ? end_buffer_async_write+0x0/0x1a0 [<ffffffff8116667e>] block_write_full_page_endio+0xde/0x110 [<ffffffffa037d3b0>] ? buffer_unmapped+0x0/0x20 [ext3] [<ffffffff811666c0>] block_write_full_page+0x10/0x20 [<ffffffffa037de6d>] ext3_writeback_writepage+0x11d/0x170 [ext3] [<ffffffff810f0152>] __writepage+0x12/0x40 [<ffffffff810f12b4>] write_cache_pages+0x1a4/0x490 [<ffffffff810f0140>] ? __writepage+0x0/0x40 [<ffffffff810f15bf>] generic_writepages+0x1f/0x30 [<ffffffff810f15f5>] do_writepages+0x25/0x30 [<ffffffff8115d5f0>] writeback_single_inode+0x90/0x220 [<ffffffff8115d9b6>] writeback_sb_inodes+0xc6/0x170 [<ffffffff8115dd3f>] wb_writeback+0x17f/0x430 [<ffffffff8106e217>] ? lock_timer_base+0x37/0x70 [<ffffffff8115e08d>] wb_do_writeback+0x9d/0x270 [<ffffffff8106e330>] ? process_timeout+0x0/0x10 [<ffffffff8115e302>] bdi_writeback_thread+0xa2/0x280 [<ffffffff8115e260>] ? bdi_writeback_thread+0x0/0x280 [<ffffffff8115e260>] ? bdi_writeback_thread+0x0/0x280 [<ffffffff8107fb56>] kthread+0x96/0xa0 [<ffffffff8100cc04>] kernel_thread_helper+0x4/0x10 [<ffffffff8107fac0>] ? kthread+0x0/0xa0 [<ffffffff8100cc00>] ? kernel_thread_helper+0x0/0x10 INFO: task updatedb:2342 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. updatedb D ffff88011bd77af0 5136 2342 2323 0x00000080 ffff88011c877cb8 0000000000000086 0000000000000000 ffff88011c1829a8 0000000000014d00 ffff88011bd77560 ffff88011bd77af0 ffff88011c877fd8 ffff88011bd77af8 0000000000014d00 ffff88011c876010 0000000000014d00 Call Trace: [<ffffffff81165130>] ? sync_buffer+0x0/0x50 [<ffffffff8149382b>] io_schedule+0x6b/0xb0 [<ffffffff8116516b>] sync_buffer+0x3b/0x50 [<ffffffff81494057>] __wait_on_bit+0x57/0x80 [<ffffffff811699c6>] ? bio_alloc_bioset+0x56/0xf0 [<ffffffff81165130>] ? sync_buffer+0x0/0x50 [<ffffffff814940f3>] out_of_line_wait_on_bit+0x73/0x90 [<ffffffff81080210>] ? wake_bit_function+0x0/0x40 [<ffffffff81165126>] __wait_on_buffer+0x26/0x30 [<ffffffffa038006c>] ext3_bread+0x5c/0x80 [ext3] [<ffffffffa037ba63>] ext3_readdir+0x1f3/0x600 [ext3] [<ffffffff8114a650>] ? filldir+0x0/0xe0 [<ffffffff8114a650>] ? filldir+0x0/0xe0 [<ffffffff8114a7e0>] vfs_readdir+0xb0/0xd0 [<ffffffff8114a964>] sys_getdents+0x84/0xf0 [<ffffffff8100bdd2>] system_call_fastpath+0x16/0x1b diff --git a/block/genhd.c b/block/genhd.c index cbf1112..6a5b772 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -1355,7 +1355,7 @@ int invalidate_partition(struct gendisk *disk, int partno) struct block_device *bdev = bdget_disk(disk, partno); if (bdev) { fsync_bdev(bdev); - res = __invalidate_device(bdev, true); + res = __invalidate_device(bdev); bdput(bdev); } return res; diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c index 77fc76f..b9ba04f 100644 --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -3281,7 +3281,7 @@ static int set_geometry(unsigned int cmd, struct floppy_struct *g, struct block_device *bdev = opened_bdev[cnt]; if (!bdev || ITYPE(drive_state[cnt].fd_device) != type) continue; - __invalidate_device(bdev, true); + __invalidate_device(bdev); } mutex_unlock(&open_lock); } else { diff --git a/fs/block_dev.c b/fs/block_dev.c index 8892870..5aae241 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -934,9 +934,9 @@ EXPORT_SYMBOL_GPL(bd_unlink_disk_holder); * when a disk has been changed -- either by a media change or online * resize. */ -static void flush_disk(struct block_device *bdev, bool kill_dirty) +static void flush_disk(struct block_device *bdev) { - if (__invalidate_device(bdev, kill_dirty)) { + if (__invalidate_device(bdev)) { char name[BDEVNAME_SIZE] = ""; if (bdev->bd_disk) @@ -973,7 +973,7 @@ void check_disk_size_change(struct gendisk *disk, struct block_device *bdev) "%s: detected capacity change from %lld to %lld\n", name, bdev_size, disk_size); i_size_write(bdev->bd_inode, disk_size); - flush_disk(bdev, false); + fsync_bdev(bdev); } } EXPORT_SYMBOL(check_disk_size_change); @@ -1026,7 +1026,7 @@ int check_disk_change(struct block_device *bdev) if (!(events & DISK_EVENT_MEDIA_CHANGE)) return 0; - flush_disk(bdev, true); + flush_disk(bdev); if (bdops->revalidate_disk) bdops->revalidate_disk(bdev->bd_disk); return 1; @@ -1607,7 +1607,7 @@ fail: } EXPORT_SYMBOL(lookup_bdev); -int __invalidate_device(struct block_device *bdev, bool kill_dirty) +int __invalidate_device(struct block_device *bdev) { struct super_block *sb = get_super(bdev); int res = 0; @@ -1620,7 +1620,7 @@ int __invalidate_device(struct block_device *bdev, bool kill_dirty) * hold). */ shrink_dcache_sb(sb); - res = invalidate_inodes(sb, kill_dirty); + res = invalidate_inodes(sb); drop_super(sb); } invalidate_bdev(bdev); diff --git a/fs/inode.c b/fs/inode.c index 0647d80..9c2b795 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -548,14 +548,11 @@ void evict_inodes(struct super_block *sb) /** * invalidate_inodes - attempt to free all inodes on a superblock * @sb: superblock to operate on - * @kill_dirty: flag to guide handling of dirty inodes * * Attempts to free all inodes for a given superblock. If there were any * busy inodes return a non-zero value, else zero. - * If @kill_dirty is set, discard dirty inodes too, otherwise treat - * them as busy. */ -int invalidate_inodes(struct super_block *sb, bool kill_dirty) +int invalidate_inodes(struct super_block *sb) { int busy = 0; struct inode *inode, *next; @@ -567,10 +564,6 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty) list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) { if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) continue; - if (inode->i_state & I_DIRTY && !kill_dirty) { - busy = 1; - continue; - } if (atomic_read(&inode->i_count)) { busy = 1; continue; diff --git a/fs/internal.h b/fs/internal.h index f3d15de..bee95ea 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -125,4 +125,4 @@ extern long do_handle_open(int mountdirfd, */ extern int get_nr_dirty_inodes(void); extern void evict_inodes(struct super_block *); -extern int invalidate_inodes(struct super_block *, bool); +extern int invalidate_inodes(struct super_block *); diff --git a/include/linux/fs.h b/include/linux/fs.h index 13df14e..ff9a159 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2155,7 +2155,7 @@ extern void check_disk_size_change(struct gendisk *disk, struct block_device *bdev); extern int revalidate_disk(struct gendisk *); extern int check_disk_change(struct block_device *); -extern int __invalidate_device(struct block_device *, bool); +extern int __invalidate_device(struct block_device *); extern int invalidate_partition(struct gendisk *, int); #endif unsigned long invalidate_mapping_pages(struct address_space *mapping, -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html