On Tue, Sep 23, 2014 at 02:33:06PM +0100, Weston Andros Adamson wrote: > On Sep 23, 2014, at 9:03 AM, Will Deacon <will.deacon@xxxxxxx> wrote: > > I've been running into the following warning on an arm64 system running > > 3.17-rc6 with 64k pages. I've been unable to reproduce with a smaller page > > size (4k). > > > > I don't yet have a concrete reproducer, but I've seen it hit a few times > > today just running a machine with an NFS root filesystem and using ssh. > > The warning seems to happen in parallel on the two CPUs, but I'm pretty > > confident that our test_and_clear_bit implementation has the relevant > > atomic instructions and memory barriers. > > > > Any ideas? > > So it looks like we’re either calling nfs_inode_remove_request twice on a request, > or somehow not grabbing the inode reference for some request that is in the async > write path. It’s interesting that these come in pairs - that has to mean something! Indeed. I have 6 CPUs on this system too, so it's not a per-cpu thing. > Any more info on how to reproduce this would be really great. Unfortunately I don’t > have access to an arm64 system. I've not spotted a pattern other than using 64k pages, yet. If I manage to get a reproducer, I'll let you know. > If it’s possible, could we get a packet trace around when this happens? This is pure > speculation, but this might have something to do the resend path - a commit fails > and all the requests on the commit list have to be resent. Sure, once I can reproduce it reliably, then I'll try to do that. > Have you noticed any side effects from this? That WARN_ON_ONCE was added > to sanity test the new page group code and we need to fix this, but I’m wondering > if anything “bad” happens… I've not noticed anything. In fact, this happened during an LTP run and I didn't see any regressions in the test results. Will -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html