On Fri, Apr 08, 2016 at 11:28:41AM +0800, Eryu Guan wrote: > On Fri, Apr 08, 2016 at 09:37:45AM +1000, Dave Chinner wrote: > > Hi folks, > > > > This is the second version of this patch set, first posted and > > described here: > > > > http://oss.sgi.com/archives/xfs/2016-04/msg00069.html > > Just a quick note here, I'm testing the v1 patchset right now, v4.6-rc2 > kernel + v1 patch, config file is based on rhel7 debug kernel config. > > The test is the same as the original reproducer (long term fsstress run > on XFS, exported from NFS). The test on x86_64 host has been running for > two days and everything looks fine. Test on ppc64 host has been running > for a few hours and I noticed a lock issue and a few warnings, not sure > if it's related to the patches or even to XFS yet(I need to run test on > stock -rc2 kernel to be sure), but just post the logs here for reference > Had the original problem ever been reproduced on an upstream kernel? FWIW, my rhel kernel based test is still running well approaching ~48 hours. I've seen some lockdep messages (bad unlock balance), but IIRC I've been seeing those from the start so I haven't been paying much attention to it while digging into the core problem. > [ 1911.626286] ====================================================== > [ 1911.626291] [ INFO: possible circular locking dependency detected ] > [ 1911.626297] 4.6.0-rc2.debug+ #1 Not tainted > [ 1911.626301] ------------------------------------------------------- > [ 1911.626306] nfsd/7402 is trying to acquire lock: > [ 1911.626311] (&s->s_sync_lock){+.+.+.}, at: [<c0000000003585f0>] .sync_inodes_sb+0xe0/0x230 > [ 1911.626327] > [ 1911.626327] but task is already holding lock: > [ 1911.626333] (sb_internal){.+.+.+}, at: [<c00000000031a780>] .__sb_start_write+0x90/0x130 > [ 1911.626346] > [ 1911.626346] which lock already depends on the new lock. > [ 1911.626346] > [ 1911.626353] > [ 1911.626353] the existing dependency chain (in reverse order) is: > [ 1911.626358] ... > [ 1911.627134] Possible unsafe locking scenario: > [ 1911.627134] > [ 1911.627139] CPU0 CPU1 > [ 1911.627143] ---- ---- > [ 1911.627147] lock(sb_internal); > [ 1911.627153] lock(&s->s_sync_lock); > [ 1911.627160] lock(sb_internal); > [ 1911.627166] lock(&s->s_sync_lock); > [ 1911.627172] > [ 1911.627172] *** DEADLOCK *** > [ 1911.627172] ... We actually have a report of this one on the list: http://oss.sgi.com/archives/xfs/2016-04/msg00001.html ... so I don't think it's related to this series. I believe I've seen this once or twice when testing something completely unrelated, as well. > [ 2046.852739] kworker/dying (399) used greatest stack depth: 4352 bytes left > [ 2854.687381] XFS: Assertion failed: buffer_mapped(bh), file: fs/xfs/xfs_aops.c, line: 780 > [ 2854.687434] ------------[ cut here ]------------ > [ 2854.687488] WARNING: CPU: 5 PID: 28924 at fs/xfs/xfs_message.c:105 .asswarn+0x2c/0x40 [xfs] ... > [ 2854.687997] ---[ end trace 872ac2709186f780 ]--- These asserts look new to me, however. It would be interesting to see if these reproduce independently. Brian > [ 2854.688001] XFS: Assertion failed: buffer_mapped(bh), file: fs/xfs/xfs_aops.c, line: 780 > [ 2854.688022] ------------[ cut here ]------------ > [ 2854.688072] WARNING: CPU: 5 PID: 28924 at fs/xfs/xfs_message.c:105 .asswarn+0x2c/0x40 [xfs] > [ 2854.688076] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_mod loop sg pseries_rng nfsd auth_rpcgss nfs_acl lockd sunrpc grace ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp > [ 2854.688110] CPU: 5 PID: 28924 Comm: kworker/u32:4 Tainted: G W 4.6.0-rc2.debug+ #1 > [ 2854.688116] Workqueue: writeback .wb_workfn (flush-253:0) > [ 2854.688121] task: c0000001e6d28380 ti: c0000000fe3ac000 task.ti: c0000000fe3ac000 > [ 2854.688126] NIP: d0000000066ddafc LR: d0000000066ddafc CTR: c0000000004dd880 > [ 2854.688131] REGS: c0000000fe3aeeb0 TRAP: 0700 Tainted: G W (4.6.0-rc2.debug+) > [ 2854.688135] MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI,TM[E]> CR: 48002048 XER: 0000000d > [ 2854.688153] CFAR: d0000000066dd890 SOFTE: 1 > GPR00: d0000000066ddafc c0000000fe3af130 d000000006765850 ffffffffffffffea > GPR04: 000000000000000a c0000000fe3aef50 00000000000000d1 ffffffffffffffc0 > GPR08: 0000000000000000 0000000000000021 00000000ffffffd1 d000000006741dc0 > GPR12: c0000000004dd880 c00000000e822d00 c0000000fe3af4e0 0000000000000001 > GPR16: c0000000fe3af6f0 0000000000000002 0000000000000000 0000000000000007 > GPR20: 0000000000000003 c0000000fe3af4f0 0000000000000000 c0000000fe3af210 > GPR24: 0000000000000004 0000000000001000 0000000000160000 f000000000f0e100 > GPR28: c0000004055d18c8 c000000016cde430 0000000000158000 c0000004055d1588 > [ 2854.688268] NIP [d0000000066ddafc] .asswarn+0x2c/0x40 [xfs] > [ 2854.688318] LR [d0000000066ddafc] .asswarn+0x2c/0x40 [xfs] > [ 2854.688321] Call Trace: > [ 2854.688369] [c0000000fe3af130] [d0000000066ddafc] .asswarn+0x2c/0x40 [xfs] (unreliable) > [ 2854.688423] [c0000000fe3af1a0] [d0000000066a9104] .xfs_do_writepage+0x414/0x930 [xfs] > [ 2854.688430] [c0000000fe3af2b0] [c00000000025df6c] .write_cache_pages+0x5fc/0x820 > [ 2854.688481] [c0000000fe3af470] [d0000000066a8a5c] .xfs_vm_writepages+0x8c/0xd0 [xfs] > [ 2854.688487] [c0000000fe3af540] [c00000000025f62c] .do_writepages+0x3c/0x70 > [ 2854.688493] [c0000000fe3af5b0] [c00000000035b1ec] .__writeback_single_inode+0x5bc/0xd50 > [ 2854.688499] [c0000000fe3af680] [c00000000035c3c0] .writeback_sb_inodes+0x380/0x730 > [ 2854.688505] [c0000000fe3af7f0] [c00000000035ca44] .wb_writeback+0x194/0x920 > [ 2854.688510] [c0000000fe3af960] [c00000000035ddcc] .wb_workfn+0x19c/0xa40 > [ 2854.688516] [c0000000fe3afad0] [c0000000000dfc74] .process_one_work+0x264/0x8f0 > [ 2854.688522] [c0000000fe3afbc0] [c0000000000e0388] .worker_thread+0x88/0x520 > [ 2854.688528] [c0000000fe3afcb0] [c0000000000e9ac4] .kthread+0x114/0x140 > [ 2854.688534] [c0000000fe3afe30] [c000000000009578] .ret_from_kernel_thread+0x58/0x60 > [ 2854.688539] Instruction dump: > [ 2854.688543] 60420000 7c0802a6 3d420000 7c691b78 7c862378 e88abe38 7ca72b78 38600000 > [ 2854.688555] 7d254b78 f8010010 f821ff91 4bfffcf9 <0fe00000> 38210070 e8010010 7c0803a6 > [ 2854.688568] ---[ end trace 872ac2709186f781 ]--- > > [then the XFS warning repeated for a few times triggered by different > pid] > > Thanks, > Eryu > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs