Re: corrupt xfs log

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Fri, 1 Sep 2017 08:11:19 -0700

On Fri, Sep 01, 2017 at 07:33:07AM -0400, Brian Foster wrote:
> On Fri, Sep 01, 2017 at 08:48:03AM +0200, Ingard - wrote:
> > On Thu, Aug 31, 2017 at 12:20 PM, Brian Foster <bfoster@xxxxxxxxxx> wrote:
> > > On Thu, Aug 31, 2017 at 09:27:52AM +0200, Ingard - wrote:
> > >> On Wed, Aug 30, 2017 at 4:58 PM, Brian Foster <bfoster@xxxxxxxxxx> wrote:
> > >> > On Mon, Aug 21, 2017 at 10:24:32PM +0200, Ingard - wrote:
> > >> >> On Mon, Aug 21, 2017 at 5:51 PM, Brian Foster <bfoster@xxxxxxxxxx> wrote:
> > >> >> > On Mon, Aug 21, 2017 at 02:08:43PM +0200, Ingard - wrote:
> > >> >> >> On Fri, Aug 18, 2017 at 2:17 PM, Brian Foster <bfoster@xxxxxxxxxx> wrote:
> > >> >> >> > On Fri, Aug 18, 2017 at 07:02:24AM -0500, Bill O'Donnell wrote:
> > >> >> >> >> On Fri, Aug 18, 2017 at 01:56:31PM +0200, Ingard - wrote:
> > >> >> >> >> > After a server crash we've encountered a corrupt xfs filesystem. When
> > >> >> >> >> > trying to mount said filesystem normally the system hangs.
> > >> >> >> >> > This was initially on a ubuntu trusty server with 3.13 kernel with
> > >> >> >> >> > xfsprogs 3.1.9
> > >> >> >> >> >
> > >> >> >> >> > We've installed a newer kernel (4.4.0-92) and compiled xfsprogs v
> > >> >> >> >> > 4.12.0 from source. We're still not able to mount the filesystem (and
> > >> >> >> >> > replay the log) normally.
> > >> >> >> >> > We are able to mount it -o ro,norecovery, but we're reluctant to do
> > >> >> >> >> > xfs_repair -L without trying everything we can first. The filesystem
> > >> >> >> >> > is browsable albeit a few paths which gives an error : "Structure
> > >> >> >> >> > needs cleaning"
> > >> >> >> >> >
> > >> >> >> >> > Does anyone have any advice as to how we might recover/repair the
> > >> >> >> >> > corrupt log so we can replay it? Or is xfs_repair -L the only way
> > >> >> >> >> > forward?
> > >> >> >> >>
> > >> >> >> >> Can you try xfs_repair -n (only scans the fs and reports what repairs
> > >> >> >> >> would be made)?
> > >> >> >> >>
> > >> >> >> >
> > >> >> >> > An xfs_metadump of the fs might be useful as well. Then we can see if we
> > >> >> >> > can reproduce the mount hang on latest kernels and if so, potentially
> > >> >> >> > try and root cause it.
> > >> >> >> >
> > >> >> >> > Brian
> > >> >> >>
> > >> >> >> Here is a link for the metadump :
> > >> >> >> https://www.jottacloud.com/p/ingardme/95ec2e45ba80431d962345981d38bdff
> > >> >> >
> > >> >> > This points to a 29GB image file, apparently uncompressed..? Could you
> > >> >> > upload a compressed file? Thanks.
> > >> >>
> > >> >> Hi. Sorry about that. Didnt realize the output would be compressable.
> > >> >> Here is a link to the compressed tgz (6G)
> > >> >> https://www.jottacloud.com/p/ingardme/cac6939649e14b98b928647f5222a2ae
> > >> >>
> > >> >
> > >> > I finally played around with this image a bit. Note that mount does not
> > >> > hang on latest kernels. Instead, log recovery emits a torn write message
> > >> > due to a bad crc at the head of the log and then ultimately fails due to
> > >> > a bad crc at the tail of the log. I ran a couple experiments to skip the
> > >> > bad crc records and/or to completely ignore all bad crc's and both still
> > >> > either fail to mount (due to other corruption) or continue to show
> > >> > corruption in the recovered fs.
> > >> >
> > >> > It's not clear to me what would have caused this corruption or log
> > >> > state. Have you encountered any corruption before? If not, is this kind
> > >> > of crash or unclean shutdown of the server an uncommon event?
> > >> We failed to notice the log messages of corrupt fs at first. After a
> > >> few days of these messages the filesystem got shut down due to
> > >> excessive? corruption.
> > >> At that point we tried to reboot normally, but ended up with having to
> > >> do a hard reset of the server.
> > >> It is not clear to us either why the corruption happened in the first
> > >> place either. The underlying raid has been in optimal state the whole
> > >> time
> > >>
> > >
> > > Ok, so corruption was the first problem. If the filesystem shutdown with
> > > something other than a log I/O error, chances are the log was flushed at
> > > that time. It is strange that log records end up corrupted, though not
> > > terribly out of the ordinary for the mount to ultimately fail if
> > > recovery stumbled over existing on-disk corruption, for instance.
> > > An xfs_repair was probably a foregone conclusion given the corruption
> > > started on disk, anyways.
> > 
> > Out of curiosity, how long did the xfs_mdrestore command run ? I'm
> > pushing 20ish hours now and noticed the following in kern.log :
> > 2017-09-01T08:47:23.414139+02:00 dn-238 kernel: [1278740.983304] XFS:
> > xfs_mdrestore(5176) possible memory allocation deadlock size 37136 in
> > kmem_alloc (mode:0x2400240)
> > 
> 
> Heh. It certainly wasn't quick since it had to restore ~30GB or so of
> metadata, but it didn't take that long. If I had to guess, I'd say it
> restored within an hour.
> 
> It seems like you're running into the in-core extent list problem, which
> causes pain for highly sparse or fragmented files because we store the
> entire extent list in memory. An fiemap of the restored image I have
> lying around shows over 1.5m extents. :/ You may need a box with more
> RAM (I had 32GB) or otherwise find another large enough block device to
> use the metadump. :/ If you had to bypass that step, you could at least
> run 'xfs_repair -n' on the original fs to see whether repair runs to
> completion in your environment.

/me wonders if it'd help if mdrestore had a command line arg to
fallocate the target file beforehand (lots of wasted space but fewer
extents) or set an extent size hint (only useful if metadata isn't
fragmented) but yes we should fix the incore extent cache memory usage
problem.

--D

> 
> Brian
> 
> > ingard
> > 
> > >
> > > Brian
> > >
> > >> >
> > >> > That aside, I think the best course of action is to run 'xfs_repair -L'
> > >> > on the fs. I ran a v4.12 version against the metadump image and it
> > >> > successfully repaired the fs. I've attached the repair output for
> > >> > reference, but I would recommend to first restore your metadump to a
> > >> > temporary location, attempt to repair that and examine the results
> > >> > before repairing the original fs. Note that the metadump will not have
> > >> > any file content, but will represent which files might be cleared, moved
> > >> > to lost+found, etc.
> > >> Ok. Thanks for looking into it. We'll proceed with the suggested
> > >> course of action.
> > >>
> > >> ingard
> > >> >
> > >> > Brian
> > >> >
> > >> >> >
> > >> >> > Brian
> > >> >> >
> > >> >> >> And the repair -n output :
> > >> >> >> https://www.jottacloud.com/p/ingardme/0205c6ca6f7e495ebcda5f255b96f63d
> > >> >> >>
> > >> >> >> kind regards
> > >> >> >> ingard
> > >> >> >>
> > >> >> >> >
> > >> >> >> >> Thanks-
> > >> >> >> >> Bill
> > >> >> >> >>
> > >> >> >> >>
> > >> >> >> >> >
> > >> >> >> >> >
> > >> >> >> >> > Excerpt from kern.log:
> > >> >> >> >> > 2017-08-17T13:40:41.122121+02:00 dn-238 kernel: [  294.300347] XFS
> > >> >> >> >> > (sdd1): Mounting V4 filesystem in no-recovery mode. Filesystem will be
> > >> >> >> >> > inconsistent.
> > >> >> >> >> >
> > >> >> >> >> > 2017-08-17T17:04:54.794194+02:00 dn-238 kernel: [12548.400260] XFS
> > >> >> >> >> > (sdd1): Metadata corruption detected at xfs_inode_buf_verify+0x6f/0xd0
> > >> >> >> >> > [xfs], xfs_inode block 0x81c9c210
> > >> >> >> >> > 2017-08-17T17:04:54.794216+02:00 dn-238 kernel: [12548.400342] XFS
> > >> >> >> >> > (sdd1): Unmount and run xfs_repair
> > >> >> >> >> > 2017-08-17T17:04:54.794218+02:00 dn-238 kernel: [12548.400374] XFS
> > >> >> >> >> > (sdd1): First 64 bytes of corrupted metadata buffer:
> > >> >> >> >> > 2017-08-17T17:04:54.794220+02:00 dn-238 kernel: [12548.400418]
> > >> >> >> >> > ffff880171fff000: 3f 1a 33 54 5b 55 85 0b 7c f5 c6 d5 cf 51 47 41
> > >> >> >> >> > ?.3T[U..|....QGA
> > >> >> >> >> > 2017-08-17T17:04:54.794222+02:00 dn-238 kernel: [12548.400473]
> > >> >> >> >> > ffff880171fff010: 97 ba ba 03 5c e4 02 7a e6 bc fb 5d f1 72 db c1
> > >> >> >> >> > ....\..z...].r..
> > >> >> >> >> > 2017-08-17T17:04:54.794223+02:00 dn-238 kernel: [12548.400527]
> > >> >> >> >> > ffff880171fff020: c8 ad 3a 76 c7 e4 20 92 88 a2 35 0c 1f 36 cf b5
> > >> >> >> >> > ..:v.. ...5..6..
> > >> >> >> >> > 2017-08-17T17:04:54.794226+02:00 dn-238 kernel: [12548.400581]
> > >> >> >> >> > ffff880171fff030: 8a bc 42 75 86 50 a0 a2 be 2c 2d 99 96 2d e1 ee
> > >> >> >> >> > ..Bu.P...,-..-..
> > >> >> >> >> >
> > >> >> >> >> > kind regards
> > >> >> >> >> > ingard
> > >> >> >> >> > --
> > >> >> >> >> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > >> >> >> >> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > >> >> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >> >> >> >> --
> > >> >> >> >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > >> >> >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> > >> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >> >> >> --
> > >> >> >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > >> >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> > >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >> >> --
> > >> >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> > >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> > >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html