On Fri, 2012-03-02 at 05:43 -0500, Christoph Hellwig wrote: Hello, > From the trace this looks like we got incorrect results from xfs_imap, > which got some fairly significant changes between the two kernels > you tested. Can you check if 2.6.36 shows this issue? It's the first > kernel that had these big changes in inode lookup. I've tested more kernels and here are results (linus git tree, no additional patches): - v2.6.36 -- good - v2.6.38 -- good - v2.6.39 -- bad, I/O error log entry and locked processes: Mar 2 20:13:58 goscie kernel: [ 431.392301] XFS (vda2): I/O error occurred: meta-data dev vda2 block 0xe6700 ("xfs_trans_read_buf") error 5 buf count 8192 Mar 2 20:13:58 goscie kernel: [ 431.427181] XFS (vda2): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5. Mar 2 20:13:59 goscie kernel: [ 433.086049] XFS (vda2): I/O error occurred: meta-data dev vda2 block 0xe6700 ("xfs_trans_read_buf") error 5 buf count 8192 Mar 2 20:13:59 goscie kernel: [ 433.088638] XFS (vda2): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5. Mar 2 20:16:47 goscie kernel: [ 600.584162] INFO: task cron:718 blocked for more than 120 seconds. Mar 2 20:16:47 goscie kernel: [ 600.585520] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 2 20:16:47 goscie kernel: [ 600.587651] cron D ffff8800174d2180 0 718 1 0x00020000 Mar 2 20:16:47 goscie kernel: [ 600.589632] ffff8800174d2180 0000000000000082 ffffffff8100e0d1 ffff88001822c7a0 Mar 2 20:16:47 goscie kernel: [ 600.592651] 0000000000013d40 ffff880013831fd8 ffff880013831fd8 0000000000013d40 Mar 2 20:16:47 goscie kernel: [ 600.595718] ffff8800174d2180 ffff880013830010 ffff88001fc13d40 ffff88001821ed00 Mar 2 20:16:47 goscie kernel: [ 600.598314] Call Trace: Mar 2 20:16:47 goscie kernel: [ 600.598640] [<ffffffff8100e0d1>] ? sched_clock+0x5/0x8 Mar 2 20:16:47 goscie kernel: [ 600.599099] [<ffffffff8139166c>] ? schedule_timeout+0x2d/0xd7 Mar 2 20:16:47 goscie kernel: [ 600.599503] [<ffffffff81391fab>] ? __down_common+0x92/0xe3 Mar 2 20:16:47 goscie kernel: [ 600.599935] [<ffffffff811d7ba0>] ? _xfs_buf_find+0x1ed/0x2c8 Mar 2 20:16:47 goscie kernel: [ 600.600383] [<ffffffff8105f3bc>] ? down+0x27/0x37 Mar 2 20:16:47 goscie kernel: [ 600.600782] [<ffffffff811d6365>] ? xfs_buf_lock+0x64/0xb0 Mar 2 20:16:47 goscie kernel: [ 600.601179] [<ffffffff811d7ba0>] ? _xfs_buf_find+0x1ed/0x2c8 Mar 2 20:16:47 goscie kernel: [ 600.601581] [<ffffffff811d7cd6>] ? xfs_buf_get+0x5b/0x14f Mar 2 20:16:47 goscie kernel: [ 600.601986] [<ffffffff811d7de0>] ? xfs_buf_read+0x16/0xbf Mar 2 20:16:47 goscie kernel: [ 600.602360] [<ffffffff811ce10a>] ? xfs_trans_read_buf+0x52/0x573 Mar 2 20:16:47 goscie kernel: [ 600.602750] [<ffffffff811b65ee>] ? xfs_imap_to_bp+0x47/0x206 Mar 2 20:16:47 goscie kernel: [ 600.603171] [<ffffffff811d6c56>] ? xfs_buf_rele+0xfa/0x17a Mar 2 20:16:47 goscie kernel: [ 600.603567] [<ffffffff811b9531>] ? xfs_iread+0x59/0x1cc Mar 2 20:16:47 goscie kernel: [ 600.603983] [<ffffffff811b3727>] ? xfs_iget+0x3e4/0x611 Mar 2 20:16:47 goscie kernel: [ 600.612400] [<ffffffff811d0d80>] ? xfs_lookup+0xaa/0xd9 Mar 2 20:16:47 goscie kernel: [ 600.612757] [<ffffffff811dc6f3>] ? xfs_vn_lookup+0x3d/0x7c Mar 2 20:16:47 goscie kernel: [ 600.613142] [<ffffffff810fed3f>] ? d_alloc_and_lookup+0x43/0x60 Mar 2 20:16:47 goscie kernel: [ 600.613611] [<ffffffff8110061c>] ? walk_component+0x161/0x30c Mar 2 20:16:47 goscie kernel: [ 600.613967] [<ffffffff81101683>] ? path_lookupat+0xad/0x33f Mar 2 20:16:47 goscie kernel: [ 600.614398] [<ffffffff81033b5a>] ? update_cfs_shares+0x7c/0x149 Mar 2 20:16:47 goscie kernel: [ 600.614779] [<ffffffff81101933>] ? do_path_lookup+0x1e/0x9a Mar 2 20:16:47 goscie kernel: [ 600.615125] [<ffffffff810ff44d>] ? getname_flags+0x14f/0x1c6 Mar 2 20:16:47 goscie kernel: [ 600.615496] [<ffffffff811023df>] ? user_path_at+0x48/0x7a Mar 2 20:16:47 goscie kernel: [ 600.615857] [<ffffffff8105e7f5>] ? hrtimer_cancel+0xc/0x16 Mar 2 20:16:47 goscie kernel: [ 600.616220] [<ffffffff81391de2>] ? do_nanosleep+0x77/0xae Mar 2 20:16:47 goscie kernel: [ 600.616583] [<ffffffff811078e7>] ? dput+0x84/0xe8 Mar 2 20:16:47 goscie kernel: [ 600.616909] [<ffffffff810fa00b>] ? vfs_fstatat+0x43/0x70 Mar 2 20:16:47 goscie kernel: [ 600.617247] [<ffffffff8102dbc0>] ? sys32_stat64+0x11/0x29 Mar 2 20:16:47 goscie kernel: [ 600.617621] [<ffffffff81399180>] ? cstar_dispatch+0x7/0x2e Mar 2 20:16:47 goscie kernel: [ 600.617984] INFO: task xfsdump:1849 blocked for more than 120 seconds. Mar 2 20:16:47 goscie kernel: [ 600.618346] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 2 20:16:47 goscie kernel: [ 600.618964] xfsdump D ffff8800178936e0 0 1849 1826 0x00020000 Mar 2 20:16:47 goscie kernel: [ 600.619448] ffff8800178936e0 0000000000000086 ffff880000000000 ffffffff8160b020 Mar 2 20:16:47 goscie kernel: [ 600.620276] 0000000000013d40 ffff88000d841fd8 ffff88000d841fd8 0000000000013d40 Mar 2 20:16:47 goscie kernel: [ 600.621068] ffff8800178936e0 ffff88000d840010 ffffea000053c950 0000000000000008 Mar 2 20:16:47 goscie kernel: [ 600.621814] Call Trace: Mar 2 20:16:47 goscie kernel: [ 600.622155] [<ffffffff8139166c>] ? schedule_timeout+0x2d/0xd7 Mar 2 20:16:47 goscie kernel: [ 600.622509] [<ffffffff811ce10a>] ? xfs_trans_read_buf+0x52/0x573 Mar 2 20:16:47 goscie kernel: [ 600.622888] [<ffffffff8119a361>] ? xfs_btree_check_sblock+0x64/0x15e Mar 2 20:16:47 goscie kernel: [ 600.623308] [<ffffffff81391fab>] ? __down_common+0x92/0xe3 Mar 2 20:16:47 goscie kernel: [ 600.623650] [<ffffffff811d7ba0>] ? _xfs_buf_find+0x1ed/0x2c8 Mar 2 20:16:47 goscie kernel: [ 600.624065] [<ffffffff8105f3bc>] ? down+0x27/0x37 Mar 2 20:16:47 goscie kernel: [ 600.624398] [<ffffffff811d6365>] ? xfs_buf_lock+0x64/0xb0 Mar 2 20:16:47 goscie kernel: [ 600.624739] [<ffffffff811d7ba0>] ? _xfs_buf_find+0x1ed/0x2c8 Mar 2 20:16:47 goscie kernel: [ 600.625127] [<ffffffff811d7cd6>] ? xfs_buf_get+0x5b/0x14f Mar 2 20:16:47 goscie kernel: [ 600.625475] [<ffffffff811d7de0>] ? xfs_buf_read+0x16/0xbf Mar 2 20:16:47 goscie kernel: [ 600.625820] [<ffffffff811ce10a>] ? xfs_trans_read_buf+0x52/0x573 Mar 2 20:16:47 goscie kernel: [ 600.626198] [<ffffffff811b65ee>] ? xfs_imap_to_bp+0x47/0x206 Mar 2 20:16:47 goscie kernel: [ 600.626550] [<ffffffff811b9531>] ? xfs_iread+0x59/0x1cc Mar 2 20:16:47 goscie kernel: [ 600.626888] [<ffffffff811b3727>] ? xfs_iget+0x3e4/0x611 Mar 2 20:16:47 goscie kernel: [ 600.627277] [<ffffffff811d3715>] ? kmem_alloc+0x28/0x9e Mar 2 20:16:47 goscie kernel: [ 600.627641] [<ffffffff811bc59b>] ? xfs_bulkstat_one_int+0x9a/0x318 Mar 2 20:16:47 goscie kernel: [ 600.628000] [<ffffffff8117f3b4>] ? xfs_bstime_store_compat+0x4d/0x4d Mar 2 20:16:47 goscie kernel: [ 600.628433] [<ffffffff8117fb58>] ? xfs_bulkstat_one_compat+0x17/0x1c Mar 2 20:16:47 goscie kernel: [ 600.628857] [<ffffffff811bc350>] ? xfs_bulkstat+0x58c/0x73d Mar 2 20:16:47 goscie kernel: [ 600.629220] [<ffffffff8117fb41>] ? xfs_compat_fssetdm_by_handle+0xc7/0xc7 Mar 2 20:16:47 goscie kernel: [ 600.629635] [<ffffffff8118006a>] ? xfs_compat_ioc_bulkstat+0x162/0x1cf Mar 2 20:16:47 goscie kernel: [ 600.630080] [<ffffffff8103680c>] ? task_rq_lock+0x47/0x7a Mar 2 20:16:47 goscie kernel: [ 600.630463] [<ffffffff81180462>] ? xfs_file_compat_ioctl+0x38b/0x465 Mar 2 20:16:47 goscie kernel: [ 600.630829] [<ffffffff81030962>] ? __wake_up_common+0x41/0x78 Mar 2 20:16:47 goscie kernel: [ 600.631177] [<ffffffff810366c2>] ? __wake_up+0x35/0x46 Mar 2 20:16:47 goscie kernel: [ 600.631537] [<ffffffff812915c0>] ? pty_write+0x48/0x53 Mar 2 20:16:47 goscie kernel: [ 600.631895] [<ffffffff8103e770>] ? perf_event_task_sched_out+0x48/0x54 Mar 2 20:16:47 goscie kernel: [ 600.632285] [<ffffffff8128cc83>] ? n_tty_write+0x322/0x360 Mar 2 20:16:47 goscie kernel: [ 600.632635] [<ffffffff8112dec3>] ? compat_sys_ioctl+0x1bb/0xfb4 Mar 2 20:16:47 goscie kernel: [ 600.632993] [<ffffffff810086b8>] ? __switch_to+0x1c9/0x288 Mar 2 20:16:47 goscie kernel: [ 600.633340] [<ffffffff810379f3>] ? pick_next_task_fair+0xf9/0x10a Mar 2 20:16:47 goscie kernel: [ 600.633707] [<ffffffff8102602a>] ? pvclock_clocksource_read+0x46/0xb4 Mar 2 20:16:47 goscie kernel: [ 600.634079] [<ffffffff810622ff>] ? timekeeping_get_ns+0xe/0x2e Mar 2 20:16:47 goscie kernel: [ 600.634430] [<ffffffff81062975>] ? getnstimeofday+0x4d/0x80 Mar 2 20:16:47 goscie kernel: [ 600.634778] [<ffffffff810629f5>] ? do_gettimeofday+0x10/0x2e Mar 2 20:16:47 goscie kernel: [ 600.635136] [<ffffffff81399180>] ? cstar_dispatch+0x7/0x2e - git revision 243b422af9ea9af4ead07a8ad54c90d4f9b6081a -- good It's one patch before "xfs: preallocation transactions do not need to be synchronous" - git revision c5850150d0b9ae16840c5d9846f9d5a759996a15: Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: stop using the page cache to back the buffer cache xfs: register the inode cache shrinker before quotachecks xfs: xfs_trans_read_buf() should return an error on failure xfs: introduce inode cluster buffer trylocks for xfs_iflush vmap: flush vmap aliases when mapping fails xfs: preallocation transactions do not need to be synchronous Hung xfsdump (only?) without any additional log entries (CONFIG_XFS_DEBUG is set): Mar 3 12:05:57 goscie kernel: [ 960.816216] INFO: task xfsdump:1519 blocked for more than 120 seconds. Mar 3 12:05:57 goscie kernel: [ 960.818113] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 3 12:05:57 goscie kernel: [ 960.820223] xfsdump D ffff88001e196950 0 1519 1490 0x00020000 Mar 3 12:05:57 goscie kernel: [ 960.821572] ffff88001e196950 0000000000000086 ffff880000000000 ffffffff8160b020 Mar 3 12:05:57 goscie kernel: [ 960.823618] 0000000000013d00 ffff880009903fd8 ffff880009903fd8 0000000000013d00 Mar 3 12:05:57 goscie kernel: [ 960.825707] ffff88001e196950 ffff880009902010 0000000100000000 0000000000000008 Mar 3 12:05:57 goscie kernel: [ 960.827746] Call Trace: Mar 3 12:05:57 goscie kernel: [ 960.828672] [<ffffffff81391493>] ? schedule_timeout+0x2d/0xd7 Mar 3 12:05:57 goscie kernel: [ 960.829937] [<ffffffff811ce4d2>] ? xfs_trans_read_buf+0x52/0x573 Mar 3 12:05:57 goscie kernel: [ 960.831118] [<ffffffff8119a8a5>] ? xfs_btree_check_sblock+0x64/0x15e Mar 3 12:05:57 goscie kernel: [ 960.831459] [<ffffffff81391dd2>] ? __down_common+0x92/0xe3 Mar 3 12:05:57 goscie kernel: [ 960.831780] [<ffffffff811d7fab>] ? _xfs_buf_find+0x1ed/0x2c8 Mar 3 12:05:57 goscie kernel: [ 960.832125] [<ffffffff8105f2fc>] ? down+0x27/0x37 Mar 3 12:05:57 goscie kernel: [ 960.832424] [<ffffffff811d6776>] ? xfs_buf_lock+0x7c/0xc8 Mar 3 12:05:57 goscie kernel: [ 960.832739] [<ffffffff811d7fab>] ? _xfs_buf_find+0x1ed/0x2c8 Mar 3 12:05:57 goscie kernel: [ 960.833068] [<ffffffff811d80e1>] ? xfs_buf_get+0x5b/0x14f Mar 3 12:05:57 goscie kernel: [ 960.833412] [<ffffffff811d81eb>] ? xfs_buf_read+0x16/0xbf Mar 3 12:05:57 goscie kernel: [ 960.833737] [<ffffffff811ce4d2>] ? xfs_trans_read_buf+0x52/0x573 Mar 3 12:05:57 goscie kernel: [ 960.834071] [<ffffffff811b6b2a>] ? xfs_imap_to_bp+0x47/0x206 Mar 3 12:05:57 goscie kernel: [ 960.834396] [<ffffffff811b9a6d>] ? xfs_iread+0x59/0x1cc Mar 3 12:05:57 goscie kernel: [ 960.834706] [<ffffffff811b3c63>] ? xfs_iget+0x3e4/0x611 Mar 3 12:05:57 goscie kernel: [ 960.835022] [<ffffffff811d3b15>] ? kmem_alloc+0x60/0x9e Mar 3 12:05:57 goscie kernel: [ 960.835338] [<ffffffff811bca6b>] ? xfs_bulkstat_one_int+0x9a/0x318 Mar 3 12:05:57 goscie kernel: [ 960.835677] [<ffffffff8117f8d4>] ? xfs_bstime_store_compat+0x4d/0x4d Mar 3 12:05:57 goscie kernel: [ 960.836029] [<ffffffff81180078>] ? xfs_bulkstat_one_compat+0x17/0x1c Mar 3 12:05:57 goscie kernel: [ 960.836377] [<ffffffff811bc820>] ? xfs_bulkstat+0x58c/0x73d Mar 3 12:05:57 goscie kernel: [ 960.836697] [<ffffffff81180061>] ? xfs_compat_fssetdm_by_handle+0xc7/0xc7 Mar 3 12:05:57 goscie kernel: [ 960.837049] [<ffffffff8118058a>] ? xfs_compat_ioc_bulkstat+0x162/0x1cf Mar 3 12:05:57 goscie kernel: [ 960.837439] [<ffffffff810b1d7a>] ? generic_file_aio_read+0x58d/0x5a2 Mar 3 12:05:57 goscie kernel: [ 960.837790] [<ffffffff81180982>] ? xfs_file_compat_ioctl+0x38b/0x465 Mar 3 12:05:57 goscie kernel: [ 960.838132] [<ffffffff811b3219>] ? xfs_iunlock+0xea/0x147 Mar 3 12:05:57 goscie kernel: [ 960.838467] [<ffffffff811d1221>] ? xfs_free_eofblocks+0xaa/0x222 Mar 3 12:05:57 goscie kernel: [ 960.838810] [<ffffffff8112e413>] ? compat_sys_ioctl+0x1bb/0xfb4 Mar 3 12:05:57 goscie kernel: [ 960.846656] [<ffffffff8102602a>] ? pvclock_clocksource_read+0x46/0xb4 Mar 3 12:05:57 goscie kernel: [ 960.847015] [<ffffffff8106222f>] ? timekeeping_get_ns+0xe/0x2e Mar 3 12:05:57 goscie kernel: [ 960.847343] [<ffffffff810628a5>] ? getnstimeofday+0x4d/0x80 Mar 3 12:05:57 goscie kernel: [ 960.847664] [<ffffffff81062925>] ? do_gettimeofday+0x10/0x2e Mar 3 12:05:57 goscie kernel: [ 960.847994] [<ffffffff81398fc0>] ? cstar_dispatch+0x7/0x2e I don't know if it's the same bug - there is no assertion nor I/O error log entry... Regards, Rafał Kupka _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs