Re: [PATCH 0/6 v2] xfs: xfs_iflush_cluster vs xfs_reclaim_inode

Brian Foster <bfoster@xxxxxxxxxx> · Fri, 8 Apr 2016 07:37:09 -0400

On Fri, Apr 08, 2016 at 11:28:41AM +0800, Eryu Guan wrote:
> On Fri, Apr 08, 2016 at 09:37:45AM +1000, Dave Chinner wrote:
> > Hi folks,
> > 
> > This is the second version of this patch set, first posted and
> > described here:
> > 
> > http://oss.sgi.com/archives/xfs/2016-04/msg00069.html
> 
> Just a quick note here, I'm testing the v1 patchset right now, v4.6-rc2
> kernel + v1 patch, config file is based on rhel7 debug kernel config.
> 
> The test is the same as the original reproducer (long term fsstress run
> on XFS, exported from NFS). The test on x86_64 host has been running for
> two days and everything looks fine. Test on ppc64 host has been running
> for a few hours and I noticed a lock issue and a few warnings, not sure
> if it's related to the patches or even to XFS yet(I need to run test on
> stock -rc2 kernel to be sure), but just post the logs here for reference
> 

Had the original problem ever been reproduced on an upstream kernel?

FWIW, my rhel kernel based test is still running well approaching ~48
hours. I've seen some lockdep messages (bad unlock balance), but IIRC
I've been seeing those from the start so I haven't been paying much
attention to it while digging into the core problem.

> [ 1911.626286] ======================================================
> [ 1911.626291] [ INFO: possible circular locking dependency detected ]
> [ 1911.626297] 4.6.0-rc2.debug+ #1 Not tainted
> [ 1911.626301] -------------------------------------------------------
> [ 1911.626306] nfsd/7402 is trying to acquire lock:
> [ 1911.626311]  (&s->s_sync_lock){+.+.+.}, at: [<c0000000003585f0>] .sync_inodes_sb+0xe0/0x230
> [ 1911.626327]
> [ 1911.626327] but task is already holding lock:
> [ 1911.626333]  (sb_internal){.+.+.+}, at: [<c00000000031a780>] .__sb_start_write+0x90/0x130
> [ 1911.626346]
> [ 1911.626346] which lock already depends on the new lock.
> [ 1911.626346]
> [ 1911.626353]
> [ 1911.626353] the existing dependency chain (in reverse order) is:
> [ 1911.626358]
...
> [ 1911.627134]  Possible unsafe locking scenario:
> [ 1911.627134]
> [ 1911.627139]        CPU0                    CPU1
> [ 1911.627143]        ----                    ----
> [ 1911.627147]   lock(sb_internal);
> [ 1911.627153]                                lock(&s->s_sync_lock);
> [ 1911.627160]                                lock(sb_internal);
> [ 1911.627166]   lock(&s->s_sync_lock);
> [ 1911.627172]
> [ 1911.627172]  *** DEADLOCK ***
> [ 1911.627172]
...

We actually have a report of this one on the list:

http://oss.sgi.com/archives/xfs/2016-04/msg00001.html

... so I don't think it's related to this series. I believe I've seen
this once or twice when testing something completely unrelated, as well.

> [ 2046.852739] kworker/dying (399) used greatest stack depth: 4352 bytes left
> [ 2854.687381] XFS: Assertion failed: buffer_mapped(bh), file: fs/xfs/xfs_aops.c, line: 780
> [ 2854.687434] ------------[ cut here ]------------
> [ 2854.687488] WARNING: CPU: 5 PID: 28924 at fs/xfs/xfs_message.c:105 .asswarn+0x2c/0x40 [xfs]
...
> [ 2854.687997] ---[ end trace 872ac2709186f780 ]---

These asserts look new to me, however. It would be interesting to see if
these reproduce independently.

Brian

> [ 2854.688001] XFS: Assertion failed: buffer_mapped(bh), file: fs/xfs/xfs_aops.c, line: 780
> [ 2854.688022] ------------[ cut here ]------------
> [ 2854.688072] WARNING: CPU: 5 PID: 28924 at fs/xfs/xfs_message.c:105 .asswarn+0x2c/0x40 [xfs]
> [ 2854.688076] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_mod loop sg pseries_rng nfsd auth_rpcgss nfs_acl lockd sunrpc grace ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp
> [ 2854.688110] CPU: 5 PID: 28924 Comm: kworker/u32:4 Tainted: G        W       4.6.0-rc2.debug+ #1
> [ 2854.688116] Workqueue: writeback .wb_workfn (flush-253:0)
> [ 2854.688121] task: c0000001e6d28380 ti: c0000000fe3ac000 task.ti: c0000000fe3ac000
> [ 2854.688126] NIP: d0000000066ddafc LR: d0000000066ddafc CTR: c0000000004dd880
> [ 2854.688131] REGS: c0000000fe3aeeb0 TRAP: 0700   Tainted: G        W        (4.6.0-rc2.debug+)
> [ 2854.688135] MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI,TM[E]>  CR: 48002048  XER: 0000000d
> [ 2854.688153] CFAR: d0000000066dd890 SOFTE: 1
> GPR00: d0000000066ddafc c0000000fe3af130 d000000006765850 ffffffffffffffea
> GPR04: 000000000000000a c0000000fe3aef50 00000000000000d1 ffffffffffffffc0
> GPR08: 0000000000000000 0000000000000021 00000000ffffffd1 d000000006741dc0
> GPR12: c0000000004dd880 c00000000e822d00 c0000000fe3af4e0 0000000000000001
> GPR16: c0000000fe3af6f0 0000000000000002 0000000000000000 0000000000000007
> GPR20: 0000000000000003 c0000000fe3af4f0 0000000000000000 c0000000fe3af210
> GPR24: 0000000000000004 0000000000001000 0000000000160000 f000000000f0e100
> GPR28: c0000004055d18c8 c000000016cde430 0000000000158000 c0000004055d1588
> [ 2854.688268] NIP [d0000000066ddafc] .asswarn+0x2c/0x40 [xfs]
> [ 2854.688318] LR [d0000000066ddafc] .asswarn+0x2c/0x40 [xfs]
> [ 2854.688321] Call Trace:
> [ 2854.688369] [c0000000fe3af130] [d0000000066ddafc] .asswarn+0x2c/0x40 [xfs] (unreliable)
> [ 2854.688423] [c0000000fe3af1a0] [d0000000066a9104] .xfs_do_writepage+0x414/0x930 [xfs]
> [ 2854.688430] [c0000000fe3af2b0] [c00000000025df6c] .write_cache_pages+0x5fc/0x820
> [ 2854.688481] [c0000000fe3af470] [d0000000066a8a5c] .xfs_vm_writepages+0x8c/0xd0 [xfs]
> [ 2854.688487] [c0000000fe3af540] [c00000000025f62c] .do_writepages+0x3c/0x70
> [ 2854.688493] [c0000000fe3af5b0] [c00000000035b1ec] .__writeback_single_inode+0x5bc/0xd50
> [ 2854.688499] [c0000000fe3af680] [c00000000035c3c0] .writeback_sb_inodes+0x380/0x730
> [ 2854.688505] [c0000000fe3af7f0] [c00000000035ca44] .wb_writeback+0x194/0x920
> [ 2854.688510] [c0000000fe3af960] [c00000000035ddcc] .wb_workfn+0x19c/0xa40
> [ 2854.688516] [c0000000fe3afad0] [c0000000000dfc74] .process_one_work+0x264/0x8f0
> [ 2854.688522] [c0000000fe3afbc0] [c0000000000e0388] .worker_thread+0x88/0x520
> [ 2854.688528] [c0000000fe3afcb0] [c0000000000e9ac4] .kthread+0x114/0x140
> [ 2854.688534] [c0000000fe3afe30] [c000000000009578] .ret_from_kernel_thread+0x58/0x60
> [ 2854.688539] Instruction dump:
> [ 2854.688543] 60420000 7c0802a6 3d420000 7c691b78 7c862378 e88abe38 7ca72b78 38600000
> [ 2854.688555] 7d254b78 f8010010 f821ff91 4bfffcf9 <0fe00000> 38210070 e8010010 7c0803a6
> [ 2854.688568] ---[ end trace 872ac2709186f781 ]---
> 
> [then the XFS warning repeated for a few times triggered by different
> pid]
> 
> Thanks,
> Eryu
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs