Hi Dave, On Tue, Dec 23, 2014 at 2:39 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Mon, Dec 22, 2014 at 09:42:12AM -0500, Brian Foster wrote: >> On Mon, Dec 22, 2014 at 10:08:18AM +1100, Dave Chinner wrote: >> > On Sun, Dec 21, 2014 at 12:13:45PM -0600, Eric Sandeen wrote: >> > > On 12/21/14 5:42 AM, Alex Lyakas wrote: >> > > > Greetings, >> > > > we encountered XFS corruption: >> > > >> > > > kernel: [774772.852316] ffff8801018c5000: 05 d1 fd 01 fd ff 2f ec 2f 8d 82 6a 81 fe c2 0f .....././..j.... >> > > >> > > There should have been 64 bytes of hexdump, not just the single line above, no? >> > >> > Yeah, really need the whole dmesg, because we've got readahead in >> > the picture here so the number of times the corruption error is seen >> > is actually important.... >> > >> > > >> > > > [813114.622928] IP: [<ffffffffa077bad9>] xfs_bmbt_get_all+0x9/0x20 [xfs] >> > > > [813114.622928] PGD 0 >> > > > [813114.622928] Oops: 0000 [#1] SMP >> > > > [813114.622928] CPU 2 >> > > > [813114.622928] Pid: 31120, comm: smbd Tainted: GF W O 3.8.13-030813-generic #201305111843 Bochs Bochs >> > > > [813114.622928] RIP: 0010:[<ffffffffa077bad9>] [<ffffffffa077bad9>] xfs_bmbt_get_all+0x9/0x20 [xfs] >> > > > [813114.622928] RSP: 0018:ffff88010a193798 EFLAGS: 00010297 >> > > > [813114.622928] RAX: 0000000000000964 RBX: ffff880180fa9c38 RCX: ffffa5a5a5a5a5a5 >> > >> > RCX implies gotp->br_startblock was not overwritten by the >> > extent search. i.e. we've called xfs_bmap_search_multi_extents() >> > but no extent was actually found. >> > >> > > > We analyzed several suspects, but all of them fall on disk addresses >> > > > not near the corrupted disk address. I realize that running somewhat >> > > > outdated kernel + our changes within XFSs, points back at us, but >> > > > this is first time we see XFS corruption after about a year of this >> > > > code being exercised. So posting here, just in case this is a known >> > > > issue. >> > > >> > > well, xfs should _never_ oops, even if it encounters corruption. So hopefully >> > > we can work backwards from the trace above to what went wrong here. >> > > >> > > offhand, in xfs_bmap_search_multi_extents(): >> > > >> > > ep = xfs_iext_bno_to_ext(ifp, bno, &lastx); >> > > if (lastx > 0) { >> > > xfs_bmbt_get_all(xfs_iext_get_ext(ifp, lastx - 1), prevp); >> > > } >> > > if (lastx < (ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t))) { >> > > xfs_bmbt_get_all(ep, gotp); >> > > *eofp = 0; >> > > >> > > xfs_iext_bno_to_ext() can return NULL with lastx set to 0: >> > > >> > > nextents = ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t); >> > > if (nextents == 0) { >> > > *idxp = 0; >> > > return NULL; >> > > } >> > > >> > > (where idxp is the &lastx we sent in) >> > >> > > and if we do that, it sure seems like the "if lastx < ...." test will wind up >> > > sending a null ep into xfs_bmbt_get_all, which would do a null ptr deref. >> > >> > No, it shouldn't because lastx = 0 to get it set that way >> > ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t) must be zero. >> > Therefore, this: >> > >> > if (lastx < (ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t))) >> > >> > evaulates as: >> > >> > if (0 < 0) >> > >> > which is not true, so we fall into the else case: >> > >> > } else { >> > if (lastx > 0) { >> > *gotp = *prevp; >> > } >> > *eofp = 1; >> > ep = NULL; >> > } >> > *lastxp = lastx; >> > return ep; >> > >> > Which basically overwrites *eofp and *lastxp, neither of which are >> > NULL. >> > >> > However, the stack trace clearly shows we've just called >> > xfs_bmap_search_multi_extents() - the "?" before the function name >> > means it found the symbol in the stack, but not in the direct line >> > of the frame pointers the current function stack points to. >> > >> > That makes me doubt the accuracy of the stack trace, because the >> > only caller of xfs_bmap_search_multi_extents() is >> > xfs_bmap_search_extents() and xfs_bmap_search_extents does not call >> > xfs_bmbt_get_all() directly like the stack trace would lead us to >> > beleive. Hence I don't think we can trust the stack trace to be >> > pointing use at the correct caller of xfs_bmbt_get_all(), which >> > makes it real hard to isolate the cause... >> > >> >> What seems strange to me here is why are we searching through extents >> when the bmbt is presumed to be corrupt? I suppose we don't know for >> sure whether the backtrace that panics is on the same inode, but the >> fact that the panic is linked with the corruption errors suggests this >> is likely. >> >> Digging through the current tot code to see how that might occur, I >> noticed an XFS_ILOCK_EXCL assert in xfs_iread_extents() that doesn't >> exist in 3.18.3. It looks like part of some fixes Christoph made a while >> back, ending with the following commit in the commit log (see some of >> the immediately prior commits as well): >> >> eef334e5776c xfs: assert that we hold the ilock for extent map access >> >> ... which suggests some paths were reading in inode extents without the >> proper locking. That would appear to be problematic in its own right >> given how XFS_IFEXTENTS is used. If that is the case, I wonder if >> hitting that problem in combination with a bmbt that happens to be >> corrupted is causing us to go off the rails? Just a theory... and >> another reason it would be really nice to have a metadump. ;) > > commit 40194ecc6d78327d98e66de3213db96ca0a31e6f > Author: Ben Myers <bpm@xxxxxxx> > Date: Fri Dec 6 12:30:11 2013 -0800 > > xfs: reinstate the ilock in xfs_readdir > > Although it was removed in commit 051e7cd44ab8, ilock needs to be taken in > xfs_readdir because we might have to read the extent list in from disk. This > keeps other threads from reading from or writing to the extent list while it i > being read in and is still in a transitional state. > > This has been associated with "Access to block zero" messages on directories > with large numbers of extents resulting from excessive filesytem fragmentation > as well as extent list corruption. Unfortunately no test case at this point. > > Signed-off-by: Ben Myers <bpm@xxxxxxx> > Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> > > Seems to match the behaviour being seen. > > Alex, what type of inode is the one that is reporting the "access to > block zero" errors? I have just searched the relevant file system for this inode, but such inode was not found:( # find /export/XXX -mount -inum 1946454529 did not find anything. Perhaps it got deleted since the incident. Thanks again, Alex. > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs