On Thu, Aug 25, 2016 at 10:35:48AM +1000, Dave Chinner wrote: > On Wed, Aug 24, 2016 at 04:42:37PM -0700, Darrick J. Wong wrote: > > Hi everyone, > > > > [cc'ing Brian because he was the last one to touch xfs_buf.c] > > > > I've been stress-testing xfs_scrub against a 900GB filesystem with 2M inodes > > using a VM with 512M of RAM. I've noticed that I get BUG messages about > > pages with negative refcount, but only if the system is under memory pressure. > > No errors are seen if the VM memory is increased to, say, 20GB. > > > > : BUG: Bad page state in process xfs_scrub pfn:00426 > > : page:ffffea0000010980 count:-1 mapcount:0 mapping: (null) index:0x0 > > : flags: 0x0() > > : page dumped because: nonzero _count > > Unless we are double-freeing a buffer, that's not an XFS problem. > Have you tried with memory posioning and allocation debug turned on? Yes. The BUG did not reproduce, though it did take nearly 35min to run scrub (which usually takes ~2min). > > : Modules linked in: xfs libcrc32c sch_fq_codel af_packet > > : CPU: 1 PID: 2058 Comm: xfs_scrub Not tainted 4.8.0-rc3-mcsum #18 > > the mm architecture was significantly modified in 4.8.0-rc1 - it > went from per-zone to per-node infrastructure, so it's entirely > possible this is a memory reclaim regression. can you reproduce it > on an older kernel (e.g. 4.7.0)? I'll try. I noticed that it's easier to make it happen when scrub is using getfsmap and/or the new in-kernel scrubbers, but that's no big surprise since that means we're pounding harder on the metadata. :) > > Obviously, a page refcount of -1 is not a good sign. I had a hunch that > > the page in question was (hopefully) a page backing an xfs_buf, so I > > applied the following debug patch: > > > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c > > index 607cc29..144b976 100644 > > --- a/fs/xfs/xfs_buf.c > > +++ b/fs/xfs/xfs_buf.c > > @@ -317,7 +317,7 @@ xfs_buf_free( > > > > for (i = 0; i < bp->b_page_count; i++) { > > struct page *page = bp->b_pages[i]; > > - > > +if (page_ref_count(page) != 1) {xfs_err(NULL, "%s: OHNO! daddr=%llu page=%p ref=%d", __func__, bp->b_bn, page, page_ref_count(page)); dump_stack();} > > __free_page(page); > > } > > } else if (bp->b_flags & _XBF_KMEM) > > > > I then saw this: > > > > : SGI XFS with ACLs, security attributes, realtime, debug enabled > > : XFS (sda): Mounting V4 Filesystem > > : XFS (sda): Ending clean mount > > : XFS: xfs_buf_free: OHNO! daddr=113849120 page=ffffea0000010980 ref=0 > > Which implies something else has dropped the page reference count on > us while we hold a reference to it. What you might like to check > what the page reference counts are on /allocation/ to see if we're > being handed a page from the freelist with a bad ref count.... Zero on allocation, except when we hit the BUG case. > If the ref counts are good at allocation, but bad on free, then I > very much doubt it's an XFS problem. We don't actually touch the > page reference count anywhere, so let's make sure that it's not a > double free or something like that in XFS first. I couldn't find any smoking gun inside XFS, which is why I went to the list -- I figured something must be doing something I don't know about. :) Anyway, I was going to push out the reflink patches for review, but the scrubber crashing held me up. Tomorrow, probably. :/ --D > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html