Re: Recently-formatted XFS filesystems reporting negative used space

Eric Sandeen <sandeen@xxxxxxxxxxx> · Sat, 21 Jul 2018 17:03:22 -0700

On 7/20/18 3:20 AM, Filippo Giunchedi wrote:
> On Tue, Jul 17, 2018 at 11:26 AM Carlos Maiolino <cmaiolino@xxxxxxxxxx> wrote:
>>>> Ouch, indeed we've seen this problem on multiple nodes, said hosts
>>>> belong to the same and latest shipment from the OEM. We'll run
>>>> hardware diagnostics on these hosts and others we've received at
>>>> another datacenter (which haven't shown issues so far but don't serve
>>>> reads either).
>>>
>>> Update on this: we've ran hw diagnostics and couldn't find anything
>>> wrong, xfs_repair does fix the issue so we'll be going ahead with
>>> that. Is there anything we can do to help debugging in case this
>>> happens again?
>>>
>>
>> There is a patch being discussed on list to help catch these bit corruptions
>> before they reach the disk, but, bear in mind we can only improve the validation
>> of our metadata. Nothing actually forbids these bit flips are occurring on your
>> data, and you are actually writing corrupted data into your files.
> 
> We've found no other cases of bit flips or corruption in metadata or
> the data itself though.
> To recap what we've seen, hardware bit flipping is extremely unlikely:
> the same type of sb_fdblocks corruption has appeared on four different
> hosts affecting at most one third of xfs filesystems per host. Also
> the corruption looks always the same, namely the 33rd bit flipped
> which also seems suspicious.

Running a debug kernel with memory poisoning, KASAN, or something similar might
help catch it if it's a stray memory write of some sort...

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html