[PATCH 0/6] xfs: fix the bulkstat mess

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 4 Nov 2014 23:53:15 +1100

Hi folks,

This series of patches applies on top of the xfs-fixes-for-3.18-rc3
branch in the upstream repository. The bulkstat refactoring in 3.17
broke bulkstat in a couple of subtle ways and exposed an API wart
that was undocumented and completely non-obvious and these lead to
xfsdump failing badly on some filesystems. The first 5 patches in
this series are necessary to fix the regressions, the 6th patch
cleans up a remaining loose end.

Essentially, the old bulkstat code was pretty much unchanged in
algorithm since 1995 when a commit was made to fix a deadlock. See
the commit in the XFS archive tree 314ccac26af ("Fix a locking bug
in bulkstat by rewriting it"). Unfortunately, this commit introduced
the unexpected behviour that the inode pointed to by the lastino
cookie is not included in the bulkstat output.

THe result of this is some tricky code in xfsdump - again
undocumented - where a specific bulkstat scan during the dump
process does not use the bulkstat lastino cookie, but uses it's own
internal inode map to determine where to start the next bulkstat
call. In this case, the bulkstat cookie has a magic "target - 1"
calculation. In all other bulkstat cases, the cookie from the
previous call is passed into the next call. This magic code in
cfsdump was introduced in 2006 as an optimisation to avoid making
too many bulkstat calls.

Essentially, the xfsdump code is working around the undocumented
lastino cookie behaviour. Seeing as userspace has already coded for
this bug, we can't change that behaviour now, and that's exactly
what the refactoring in 3.17 did - it rejected "lastino - 1" in
certain circumstances with EINVAL instead of doing the right thing
and returning the correct inodes.

But, this was the last problem I found, because the EINVAL was
dropped on the floor and ignored, because bulkstat only returned
errors from the formatting function of inodes it did find. Hence the
fact it didn't scan an inode it should have was silently lost.

I found this problem after I realised that errors from the
formatting function are only returned if no other inodes were
successfully formatted at all. Which means, in reality, errors
encountered during bulkstat are universally ignored and the result
is "successful" scans that are silently missing inodes. Which means
xfsdump succeeds, but xfsrestore tells us that it was a corrupted
dump.

Of course, in getting to this point, I first had to add the missing
checks in the btree walk termination detection, fix the user buffer
space tracking, the inode tracking, the slightly broken chunk
formattting loop conditions, remove a bunch of unused variables,
and so on. There was a *lot* that didn't work correctly and I'm
still slightly stunned at the fact it worked well enough to pass
all the xfsdump/restore QA in xfstests.

As such, the first 5 patches are -stable kernel material, and the
last patch should go with them into the 3.18 tree to complete the
cleanup of code. This needs eyeballs and testing - create some
metadumps of your production filesystems and use them to test
xfsdump on a kernel with these changes......

Comments, thoughts, testing all welcome as this needs to go upstream
sooner rather than later....

-Dave.

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs