Re: Recently-formatted XFS filesystems reporting negative used space

Carlos Maiolino <cmaiolino@xxxxxxxxxx> · Tue, 17 Jul 2018 11:26:26 +0200

On Mon, Jul 16, 2018 at 11:29:51AM +0200, Filippo Giunchedi wrote:
> On Wed, Jul 11, 2018 at 10:31 AM Filippo Giunchedi
> <fgiunchedi@xxxxxxxxxxxxx> wrote:
> > > that sb_fdblocks really is ~17T which indicates the problem
> > > really is on disk.
> > >
> > > 4461713825
> > > 100001001111100000101100110100001
> > > 166746529
> > >      1001111100000101100110100001
> > >
> > > you have a bit flipped in the problematic value... but you're running
> > > with CRCs so it seems unlikely to have been some sort of bit-rot (that,
> > > and the fact that you're hitting the same problem on multiple nodes).
> >
> > Ouch, indeed we've seen this problem on multiple nodes, said hosts
> > belong to the same and latest shipment from the OEM. We'll run
> > hardware diagnostics on these hosts and others we've received at
> > another datacenter (which haven't shown issues so far but don't serve
> > reads either).
> 
> Update on this: we've ran hw diagnostics and couldn't find anything
> wrong, xfs_repair does fix the issue so we'll be going ahead with
> that. Is there anything we can do to help debugging in case this
> happens again?
> 

There is a patch being discussed on list to help catch these bit corruptions
before they reach the disk, but, bear in mind we can only improve the validation
of our metadata. Nothing actually forbids these bit flips are occurring on your
data, and you are actually writing corrupted data into your files.

Cheers

> thanks a lot!
> Filippo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html