Re: Recently-formatted XFS filesystems reporting negative used space

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Eric and Dave!

On Tue, Jul 10, 2018 at 11:39 PM Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:
>
> On 7/10/18 8:43 AM, Filippo Giunchedi wrote:
> > Hello,
> > a little background: at Wikimedia Foundation we are running a 30-hosts
> > Openstack Swift cluster to host user media uploads, each host has 12
> > spinning disks formatted individually with xfs.
> >
> > Some of the recently-formatted filesystems have started reporting
> > negative usage upon hitting around 70% usage, though some filesystems
> > on the same host kept reporting as expected:
> >
> > /dev/sdn1       3.7T  -14T   17T    - /srv/swift-storage/sdn1
> > /dev/sdh1       3.7T  -13T   17T    - /srv/swift-storage/sdh1
> > /dev/sdc1       3.7T  3.0T  670G  83% /srv/swift-storage/sdc1
> > /dev/sdk1       3.7T  3.1T  643G  83% /srv/swift-storage/sdk1
> >
> > We have experienced this bug only on the last four machines to be put
> > in service and formatted with xfsprogs 4.9.0+nmu1 from Debian Stretch.
> > The remaining hosts were formatted in the past with xfsprogs 3.2.1 or
> > older.
> > We have also a standby cluster in another datacenter with similar
> > configuration and hosts that received write traffic only but not read
> > traffic; the standby cluster hasn't experienced the bug and all
> > filesystems report the correct usage.
> > As far as I can tell the difference in xfsprogs version used for
> > formatting means defaults have changed, (e.g. crc is enabled on the
> > affected filesystems). Have you seen this issue before and do you know
> > how to fix it?
> >
> > I would love to help debugging this issue, we've been detailing the
> > work done so far at https://phabricator.wikimedia.org/T199198
>
> What kernel are the problematic nodes running?

All nodes are running 4.9.82-1+deb9u3

> From your repair output:
>
> root@ms-be1040:~# xfs_repair -n /dev/sde1
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
> sb_fdblocks 4461713825, counted 166746529
>         - found root inode chunk
>
> that sb_fdblocks really is ~17T which indicates the problem
> really is on disk.
>
> 4461713825
> 100001001111100000101100110100001
> 166746529
>      1001111100000101100110100001
>
> you have a bit flipped in the problematic value... but you're running
> with CRCs so it seems unlikely to have been some sort of bit-rot (that,
> and the fact that you're hitting the same problem on multiple nodes).

Ouch, indeed we've seen this problem on multiple nodes, said hosts
belong to the same and latest shipment from the OEM. We'll run
hardware diagnostics on these hosts and others we've received at
another datacenter (which haven't shown issues so far but don't serve
reads either).

thanks for your help!

Filippo
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux