On Tue, Oct 28, 2008 at 04:39:53PM +0100, Nick Piggin wrote: > On Wed, Oct 29, 2008 at 01:47:15AM +1100, npiggin@xxxxxxx wrote: > > OK, I'm happier with this patchset now. Note that I've taken your patch > > and mangled it a bit at the end of the series. > > > > This one survives and seems to run OK here, but I'm mainly doing dumb > > stress testing with a handful of filesystems, and data-io error injection > > testing. There are a lot of combinations of ways this function can operate > > and interact obviously, so it would be helpful to get more review. > > > > Chris, would you possibly have time to run your btrfs tests that are > > sensitive to problems in this code? I could provide you a single patch > > rollup against mainline if it helps. > > BTW. XFS seems to be doing something interesting with my simple sync > test case with IO error injection. I map a file MAP_SHARED into a number of > processes, which then each run a loop that dirties the memory then calls > msync(MS_SYNC) on the range. > > ext2 mostly reports -EIO back to userspace when a failure is injected AFAIKS. > ext3 (ordered) doesn't until a lot of errors have been injected, but eventually > reports -EIO and shuts down the filesystem. reiserfs seems to report failure > more consistently. > > I haven't seen any -EIO failures from XFS... maybe I'm just not doing the > right thing, or there is a caveat I'm not aware of. > > All fault injections I noticed had a trace like this: > FAULT_INJECTION: forcing a failure > Call Trace: > 9f9cd758: [<6019f1de>] random32+0xe/0x20 > 9f9cd768: [<601a31b9>] should_fail+0xd9/0x130 > 9f9cd798: [<6018d0c4>] generic_make_request+0x304/0x4e0 > 9f9cd7a8: [<60062301>] mempool_alloc+0x51/0x130 > 9f9cd858: [<6018e6bf>] submit_bio+0x4f/0xe0 > 9f9cd8a8: [<60165505>] xfs_submit_ioend_bio+0x25/0x40 > 9f9cd8c8: [<6016603c>] xfs_submit_ioend+0xbc/0xf0 > 9f9cd908: [<60166bf9>] xfs_page_state_convert+0x3d9/0x6a0 > 9f9cd928: [<6005d515>] delayacct_end+0x95/0xb0 > 9f9cda08: [<60166ffd>] xfs_vm_writepage+0x6d/0x110 > 9f9cda18: [<6006618b>] set_page_dirty+0x4b/0xd0 > 9f9cda58: [<60066115>] __writepage+0x15/0x40 > 9f9cda78: [<60066775>] write_cache_pages+0x255/0x470 > 9f9cda90: [<60066100>] __writepage+0x0/0x40 > 9f9cdb98: [<600669b0>] generic_writepages+0x20/0x30 > 9f9cdba8: [<60165ba3>] xfs_vm_writepages+0x53/0x70 > 9f9cdbd8: [<600669eb>] do_writepages+0x2b/0x40 > 9f9cdbf8: [<6006004c>] __filemap_fdatawrite_range+0x5c/0x70 > 9f9cdc58: [<6006026a>] filemap_fdatawrite+0x1a/0x20 > 9f9cdc68: [<600a7a05>] do_fsync+0x45/0xe0 > 9f9cdc98: [<6007794b>] sys_msync+0x14b/0x1d0 > 9f9cdcf8: [<60019a70>] handle_syscall+0x50/0x80 > 9f9cdd18: [<6002a10f>] userspace+0x44f/0x510 > 9f9cdfc8: [<60016792>] fork_handler+0x62/0x70 XFS reports bio errors through the I/O completion path, not the submission path. > And the kernel would sometimes say this: > Buffer I/O error on device ram0, logical block 279 > lost page write due to I/O error on ram0 > Buffer I/O error on device ram0, logical block 379 > lost page write due to I/O error on ram0 > Buffer I/O error on device ram0, logical block 389 > lost page write due to I/O error on ram0 Yes - that's coming from end_buffer_async_write() when an error is reported in bio completion. This does: 465 set_bit(AS_EIO, &page->mapping->flags); 466 set_buffer_write_io_error(bh); 467 clear_buffer_uptodate(bh); 468 SetPageError(page); Hmmmm - do_fsync() calls filemap_fdatawait() which ends up in wait_on_page_writeback_range() which is appears to be checking the mapping flags for errors. I wonder why that error is not being propagated then? AFAICT both XFS and the fsync code are doing the right thing but somewhere the error has gone missing... > I think I also saw a slab bug when running dbench with fault injection on. > Running latest Linus kernel. > > bash-3.1# dbench -t10 -c ../client.txt 8 > dbench version 3.04 - Copyright Andrew Tridgell 1999-2004 > > Running for 10 seconds with load '../client.txt' and minimum warmup 2 secs > 8 clients started > FAULT_INJECTION: forcing a failure > Call Trace: > 9e7bb548: [<601623ae>] random32+0xe/0x20 > 9e7bb558: [<60166389>] should_fail+0xd9/0x130 > 9e7bb588: [<60150294>] generic_make_request+0x304/0x4e0 > 9e7bb598: [<60062301>] mempool_alloc+0x51/0x130 > 9e7bb648: [<6015188f>] submit_bio+0x4f/0xe0 > 9e7bb698: [<6012b440>] _xfs_buf_ioapply+0x180/0x2a0 > 9e7bb6a0: [<6002f600>] default_wake_function+0x0/0x10 > 9e7bb6f8: [<6012bae1>] xfs_buf_iorequest+0x31/0x90 > 9e7bb718: [<60112f75>] xlog_bdstrat_cb+0x45/0x50 > 9e7bb738: [<60114135>] xlog_sync+0x195/0x440 > 9e7bb778: [<60114491>] xlog_state_release_iclog+0xb1/0xc0 > 9e7bb7a8: [<60114ca9>] xlog_write+0x539/0x550 > 9e7bb858: [<60114e60>] xfs_log_write+0x40/0x60 > 9e7bb888: [<6011fbaa>] _xfs_trans_commit+0x19a/0x360 > 9e7bb8b8: [<600838e2>] poison_obj+0x42/0x60 > 9e7bb8d0: [<60082cb3>] dbg_redzone1+0x13/0x30 > 9e7bb8e8: [<60083999>] cache_alloc_debugcheck_after+0x99/0x1c0 > 9e7bb918: [<6008517b>] kmem_cache_alloc+0x8b/0x100 > 9e7bb958: [<60128084>] kmem_zone_alloc+0x74/0xe0 > 9e7bb998: [<60082ad9>] kmem_cache_size+0x9/0x10 > 9e7bb9a8: [<60128124>] kmem_zone_zalloc+0x34/0x50 > 9e7bb9e8: [<60121e8b>] xfs_dir_ialloc+0x13b/0x2e0 > 9e7bba58: [<601f534b>] __down_write+0xb/0x10 > 9e7bbaa8: [<60125b9e>] xfs_mkdir+0x37e/0x4b0 > 9e7bbb38: [<601f5589>] _spin_unlock+0x9/0x10 > 9e7bbb78: [<601301a4>] xfs_vn_mknod+0xf4/0x1a0 > 9e7bbbd8: [<6013025e>] xfs_vn_mkdir+0xe/0x10 > 9e7bbbe8: [<60091010>] vfs_mkdir+0x90/0xc0 > 9e7bbc18: [<600934d6>] sys_mkdirat+0x106/0x120 > 9e7bbc88: [<6008629b>] filp_close+0x4b/0x80 > 9e7bbce8: [<60093503>] sys_mkdir+0x13/0x20 > 9e7bbcf8: [<60019a70>] handle_syscall+0x50/0x80 > 9e7bbd18: [<6002a10f>] userspace+0x44f/0x510 > 9e7bbfc8: [<60016792>] fork_handler+0x62/0x70 > > I/O error in filesystem ("ram0") meta-data dev ram0 block 0x8002c ("xlog_i > odone") error 5 buf count 32768 > xfs_force_shutdown(ram0,0x2) called from line 1056 of file /home/npiggin/usr/src > /linux-2.6/fs/xfs/xfs_log.c. Return address = 0x000000006011370d > Filesystem "ram0": Log I/O Error Detected. Shutting down filesystem: ram0 > Please umount the filesystem, and rectify the problem(s) > xfs_force_shutdown(ram0,0x2) called from line 818 of file /home/npiggin/usr/src/ > linux-2.6/fs/xfs/xfs_log.c. Return address = 0x0000000060114e7d > slab error in verify_redzone_free(): cache `xfs_log_ticket': double free detecte > d > Call Trace: > 9e7bb998: [<6008372f>] __slab_error+0x1f/0x30 > 9e7bb9a8: [<60083cae>] cache_free_debugcheck+0x1ee/0x240 > 9e7bb9b0: [<60112ef0>] xlog_ticket_put+0x10/0x20 > 9e7bb9e8: [<60083f70>] kmem_cache_free+0x50/0xc0 > 9e7bba18: [<60112ef0>] xlog_ticket_put+0x10/0x20 > 9e7bba28: [<60114dc9>] xfs_log_done+0x59/0xb0 > 9e7bba68: [<6011f5de>] xfs_trans_cancel+0x7e/0x140 > 9e7bbaa8: [<60125a1e>] xfs_mkdir+0x1fe/0x4b0 > 9e7bbb38: [<601f5589>] _spin_unlock+0x9/0x10 > 9e7bbb78: [<601301a4>] xfs_vn_mknod+0xf4/0x1a0 > 9e7bbbd8: [<6013025e>] xfs_vn_mkdir+0xe/0x10 > 9e7bbbe8: [<60091010>] vfs_mkdir+0x90/0xc0 > 9e7bbc18: [<600934d6>] sys_mkdirat+0x106/0x120 > 9e7bbc88: [<6008629b>] filp_close+0x4b/0x80 > 9e7bbce8: [<60093503>] sys_mkdir+0x13/0x20 > 9e7bbcf8: [<60019a70>] handle_syscall+0x50/0x80 > 9e7bbd18: [<6002a10f>] userspace+0x44f/0x510 > 9e7bbfc8: [<60016792>] fork_handler+0x62/0x70 Now that is interesting. We've got a rolling transaction in progress, and the commit of the first part of the transaction has got the I/O error. That frees the transaction structure used during that commit, as well as the ticket. However, before we committed the initial transaction, we duplicated the transaction structure to allow the transaction to continue to track all the dirty objects in the first commit. That included duplicating the pointer to the ticket. Then the EIO is returned to mkdir code with the duplicated transaction, which is then cancelled, and that frees the transaction and the ticket it holds. However, we'd already freed the ticket. Ok, we're only seeing this problem now because I recently modified the ticket allocation to use a slab instead of a roll-your-own free list structure that wouldn't have been poisoned. Nice to know that this change did more than just remove code. ;) This might take a little while to fix - a lot of code needs auditing - but thanks for reporting the problem. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html