Re: XFS / xfs_repair - problem reading very large sparse files on very large filesystem

Eric Sandeen <sandeen@xxxxxxxxxxx> · Thu, 4 Nov 2021 11:20:12 -0500

On 11/4/21 4:09 AM, Nikola Ciprich wrote:
Hello fellow XFS users and developers,

we've stumbled upon strange problem which I think might be somewhere
in XFS code.

we have very large ceph-based storage on top which there is 1.5PiB volume
with XFS filesystem. This contains very large (ie 500TB) sparse files,
partially filled with data.

problem is, trying to read those files leads to processes blocked in D
state showing very very bad performance - ~200KiB/s, 50IOPS.

I'm guessing they are horrifically fragmented? What does xfs_bmap tell you
about the number of extents in one of these files?

When it is blocked, where is it blocked?  (try sysrq-w)

I tried running xfs_repair on the volume, but this seems to behave in
very similar way - very quickly it gets into almost stalled state, without
almost any progress..

Perceived performance won't be fixed by repair, but...

[root@spbstdnas ~]# xfs_repair -P -t 60 -v -v -v -v /dev/sdk
Phase 1 - find and verify superblock...
         - max_mem = 154604838, icount = 9664, imem = 37, dblock = 382464425984, dmem = 186750208
Memory available for repair (150981MB) may not be sufficient.
At least 182422MB is needed to repair this filesystem efficiently
If repair fails due to lack of memory, please
increase system RAM and/or swap space to at least 364844MB.

... it /is/ telling you that it would like a lot more memory to do
its job.

Phase 2 - using internal log
         - zero log...
zero_log: head block 1454674 tail block 1454674
         - scan filesystem freespace and inode maps...
         - found root inode chunk
...
Phase 3 - for each AG...
         - scan and clear agi unlinked lists...
         - process known inodes and perform inode discovery...
         - agno = 0
         - agno = 1
         - agno = 2

         - agno = 3

VM has 200GB of RAM, but the xfs_repair does not use more then 1GB,
CPU is idle. it just only reads the same slow speed, ~200K/s, 50IOPS.

Rather than diagnosing repair at this point, let's first see where you're
blocked when you're reading the sparse files on the filesystem as suggested
above.

-Eric

I've carefully checked, and the storage speed is much much faster, checked
with blktrace which areas of the volume it is currently reading, and trying
fio / dd on them shows it can perform much faster (as well as randomly reading
any area of the volume or trying randomread or seq read fio benchmarks)

I've found one, very old report pretty much resembling my problem:

https://www.spinics.net/lists/xfs/msg06585.html

but it is 10 years old and didn't lead to any conclusion.

Is it possible there is still some bug common for XFS kernel module and xfs_repair?

I tried 5.4.135 and 5.10.31 kernels, xfs_progs 4.5.0 and 5.13.0
(OS is x86_64 centos 7)

any hints on how could I further debug that?

I'd be very gratefull for any help

with best regards

nikola ciprich