Re: XFS / xfs_repair - problem reading very large sparse files on very large filesystem

Nikola Ciprich <nikola.ciprich@xxxxxxxxxxx> · Fri, 5 Nov 2021 15:13:43 +0100

Hello Eric,

I'm sorry for late reply.

> I'm guessing they are horrifically fragmented? What does xfs_bmap tell you
> about the number of extents in one of these files?

unfortunately, xfs_bmap blocks on this file too:

[  +0.000321] task:xfs_io          state:D stack:    0 pid:15728 ppid: 15725 flags:0x00000080
[  +0.000333] Call Trace:
[  +0.000161]  __schedule+0x231/0x760
[  +0.000195]  ? page_add_new_anon_rmap+0x9e/0x1f0
[  +0.000207]  schedule+0x3c/0xa0
[  +0.000175]  rwsem_down_write_slowpath+0x32c/0x4e0
[  +0.000216]  ? get_page_from_freelist+0x190d/0x1c60
[  +0.000250]  xfs_ilock_data_map_shared+0x29/0x30 [xfs]
[  +0.000312]  xfs_getbmap+0xe2/0x7b0 [xfs]
[  +0.000197]  ? _cond_resched+0x15/0x30
[  +0.000203]  ? __kmalloc_node+0x4a4/0x4e0
[  +0.000230]  xfs_ioc_getbmap+0xf5/0x270 [xfs]
[  +0.000260]  xfs_file_ioctl+0x4da/0xbc0 [xfs]
[  +0.000205]  ? __mod_memcg_lruvec_state+0x21/0x100
[  +0.000203]  ? page_add_new_anon_rmap+0x9e/0x1f0
[  +0.000209]  ? __raw_spin_unlock+0x5/0x10
[  +0.000188]  ? __handle_mm_fault+0xbb0/0x1410
[  +0.000221]  ? handle_mm_fault+0xd0/0x290
[  +0.000191]  __x64_sys_ioctl+0x84/0xc0
[  +0.000181]  do_syscall_64+0x33/0x40
[  +0.000188]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  +0.000213] RIP: 0033:0x7fdc81f694a7
[  +0.000192] RSP: 002b:00007ffe98c69998 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  +0.000319] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fdc81f694a7
[  +0.000311] RDX: 00000000010d6a00 RSI: ffffffffc0205838 RDI: 0000000000000003
[  +0.000322] RBP: 0000000000000020 R08: 0000000000000000 R09: 0000000000000600
[  +0.000303] R10: 0000000000000048 R11: 0000000000000246 R12: 0000000000000000
[  +0.000310] R13: 00000000010d6a00 R14: 0000000000000000 R15: 0000000000000000

> 
> When it is blocked, where is it blocked?  (try sysrq-w)
[  +0.016252] task:pv              state:D stack:    0 pid:15507 ppid:  1161 flags:0x00004080
[  +0.000373] Call Trace:
[  +0.000177]  __schedule+0x231/0x760
[  +0.000190]  schedule+0x3c/0xa0
[  +0.000175]  schedule_timeout+0x215/0x2b0
[  +0.000197]  ? blk_mq_get_tag+0x244/0x280
[  +0.000201]  __down+0x9b/0xf0
[  +0.000189]  ? blk_mq_complete_request_remote+0x50/0xc0
[  +0.000223]  down+0x3b/0x50
[  +0.000385]  xfs_buf_lock+0x2c/0xb0 [xfs]
[  +0.000259]  xfs_buf_find.isra.32+0x3d9/0x610 [xfs]
[  +0.000275]  xfs_buf_get_map+0x4c/0x2e0 [xfs]
[  +0.000199]  ? submit_bio+0x43/0x160
[  +0.000232]  xfs_buf_read_map+0x55/0x2c0 [xfs]
[  +0.000237]  ? xfs_btree_read_buf_block.constprop.40+0x95/0xd0 [xfs]
[  +0.000328]  xfs_trans_read_buf_map+0x123/0x2d0 [xfs]
[  +0.000279]  ? xfs_btree_read_buf_block.constprop.40+0x95/0xd0 [xfs]
[  +0.000299]  xfs_btree_read_buf_block.constprop.40+0x95/0xd0 [xfs]
[  +0.000301]  xfs_btree_lookup_get_block+0x95/0x170 [xfs]
[  +0.000263]  ? xfs_bmap_validate_extent+0xa0/0xa0 [xfs]
[  +0.000257]  xfs_btree_visit_block+0x85/0xc0 [xfs]
[  +0.000237]  ? xfs_bmap_validate_extent+0xa0/0xa0 [xfs]
[  +0.000263]  xfs_btree_visit_blocks+0x109/0x120 [xfs]
[  +0.000246]  xfs_iread_extents+0x9f/0x170 [xfs]
[  +0.000246]  ? xfs_bmapi_read+0x23b/0x2c0 [xfs]
[  +0.000233]  xfs_bmapi_read+0x23b/0x2c0 [xfs]
[  +0.000214]  ? _cond_resched+0x15/0x30
[  +0.000214]  ? down_write+0xe/0x40
[  +0.000230]  xfs_read_iomap_begin+0xea/0x1e0 [xfs]
[  +0.000228]  iomap_apply+0x94/0x2d0
[  +0.000181]  ? iomap_page_mkwrite_actor+0x70/0x70
[  +0.008736]  ? iomap_page_mkwrite_actor+0x70/0x70
[  +0.000219]  iomap_readahead+0x9a/0x150
[  +0.000207]  ? iomap_page_mkwrite_actor+0x70/0x70
[  +0.000216]  read_pages+0x8e/0x1f0
[  +0.000183]  page_cache_ra_unbounded+0x19d/0x1f0
[  +0.000207]  generic_file_buffered_read+0x3f8/0x800
[  +0.000266]  xfs_file_buffered_aio_read+0x44/0xb0 [xfs]
[  +0.000280]  xfs_file_read_iter+0x68/0xc0 [xfs]
[  +0.000204]  new_sync_read+0x118/0x1a0
[  +0.000195]  vfs_read+0xf1/0x180
[  +0.000173]  ksys_read+0x59/0xd0
[  +0.000187]  do_syscall_64+0x33/0x40
[  +0.000186]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  +0.000215] RIP: 0033:0x7f209bf06b40
[  +0.000179] RSP: 002b:00007ffd9d11aeb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  +0.000383] RAX: ffffffffffffffda RBX: 00007ffd9d11b0c0 RCX: 00007f209bf06b40
[  +0.000308] RDX: 0000000000020000 RSI: 00007f209c3d9010 RDI: 0000000000000003
[  +0.000309] RBP: 00007ffd9d11b0c4 R08: 0000000000000000 R09: 0000000000000004
[  +0.000308] R10: 00007ffd9d11a2a0 R11: 0000000000000246 R12: 00000000018290d0
[  +0.000318] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000020000

> 
> >I tried running xfs_repair on the volume, but this seems to behave in
> >very similar way - very quickly it gets into almost stalled state, without
> >almost any progress..
> 
> Perceived performance won't be fixed by repair, but...
> 
> >[root@spbstdnas ~]# xfs_repair -P -t 60 -v -v -v -v /dev/sdk
> >Phase 1 - find and verify superblock...
> >         - max_mem = 154604838, icount = 9664, imem = 37, dblock = 382464425984, dmem = 186750208
> >Memory available for repair (150981MB) may not be sufficient.
> >At least 182422MB is needed to repair this filesystem efficiently
> >If repair fails due to lack of memory, please
> >increase system RAM and/or swap space to at least 364844MB.
> 
> ... it /is/ telling you that it would like a lot more memory to do
> its job.
> 
> >Phase 2 - using internal log
> >         - zero log...
> >zero_log: head block 1454674 tail block 1454674
> >         - scan filesystem freespace and inode maps...
> >         - found root inode chunk
> ...
> >Phase 3 - for each AG...
> >         - scan and clear agi unlinked lists...
> >         - process known inodes and perform inode discovery...
> >         - agno = 0
> >         - agno = 1
> >         - agno = 2
> >
> >
> >         - agno = 3
> >
> >VM has 200GB of RAM, but the xfs_repair does not use more then 1GB,
> >CPU is idle. it just only reads the same slow speed, ~200K/s, 50IOPS.
> 
> Rather than diagnosing repair at this point, let's first see where you're
> blocked when you're reading the sparse files on the filesystem as suggested
> above.
OK.

please let me know, if I could provide any further info

with best regards

nikola ciprich

> 
> -Eric
> 
> >I've carefully checked, and the storage speed is much much faster, checked
> >with blktrace which areas of the volume it is currently reading, and trying
> >fio / dd on them shows it can perform much faster (as well as randomly reading
> >any area of the volume or trying randomread or seq read fio benchmarks)
> >
> >I've found one, very old report pretty much resembling my problem:
> >
> >https://www.spinics.net/lists/xfs/msg06585.html
> >
> >but it is 10 years old and didn't lead to any conclusion.
> >
> >Is it possible there is still some bug common for XFS kernel module and xfs_repair?
> >
> >I tried 5.4.135 and 5.10.31 kernels, xfs_progs 4.5.0 and 5.13.0
> >(OS is x86_64 centos 7)
> >
> >any hints on how could I further debug that?
> >
> >I'd be very gratefull for any help
> >
> >with best regards
> >
> >nikola ciprich
> >
> >
> 

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@xxxxxxxxxxx
-------------------------------------