Hi,
Recently I'm hitting the following soft lockup related to cgroup charge
on aarch64:
watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [btrfs:698546]
Modules linked in: dm_log_writes dm_flakey nls_ascii nls_cp437 vfat
crct10dif_ce polyval_ce polyval_generic ghash_ce rtc_efi fat processor
btrfs xor xor_neon raid6_pq zstd_compress fuse loop nfnetlink
qemu_fw_cfg ext4 mbcache jbd2 dm_mod xhci_pci virtio_net
xhci_pci_renesas net_failover xhci_hcd virtio_balloon virtio_scsi
failover dimlib virtio_blk virtio_console virtio_mmio
irq event stamp: 47291484
hardirqs last enabled at (47291483): [<ffffabe6d1a5d294>]
try_charge_memcg+0x3ac/0x780
hardirqs last disabled at (47291484): [<ffffabe6d2401244>]
el1_interrupt+0x24/0x80
softirqs last enabled at (47282714): [<ffffabe6d168e7a4>]
handle_softirqs+0x2bc/0x310
softirqs last disabled at (47282709): [<ffffabe6d16301e4>]
__do_softirq+0x1c/0x28
CPU: 3 PID: 698546 Comm: btrfs Not tainted 6.10.0-rc6-custom+ #34
Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : try_charge_memcg+0x154/0x780
lr : try_charge_memcg+0x3ac/0x780
sp : ffff800089b83430
x29: ffff800089b834a0 x28: 0000000000000002 x27: ffffabe6d2b515e8
x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000008c40
x23: ffffabe6d2b515e8 x22: 0000000000000000 x21: 0000000000000040
x20: ffff4854c6b32000 x19: 0000000000000004 x18: 0000000000000000
x17: 0000000000000000 x16: ffffabe6d19474a8 x15: ffff4854d24b6f88
x14: 0000000000000000 x13: 0000000000000000 x12: ffff4854ff1cdfd0
x11: ffffabe6d4330370 x10: ffffabe6d46442ec x9 : ffffabe6d2b3f6e4
x8 : ffff800089b83340 x7 : ffff800089b84000 x6 : ffff800089b80000
x5 : 0000000000000000 x4 : 0000000000000006 x3 : 000000ffffffffff
x2 : 0000000000000001 x1 : ffffabe6d2b3f6e0 x0 : 0000000002d19c5b
Call trace:
try_charge_memcg+0x154/0x780
__mem_cgroup_charge+0x5c/0xc0
filemap_add_folio+0x5c/0x118
attach_eb_folio_to_filemap+0x84/0x4e0 [btrfs]
alloc_extent_buffer+0x1d4/0x730 [btrfs]
btrfs_find_create_tree_block+0x20/0x48 [btrfs]
btrfs_readahead_tree_block+0x4c/0xd8 [btrfs]
relocate_tree_blocks+0x1d8/0x3a0 [btrfs]
relocate_block_group+0x37c/0x508 [btrfs]
btrfs_relocate_block_group+0x274/0x458 [btrfs]
btrfs_relocate_chunk+0x54/0x1b8 [btrfs]
__btrfs_balance+0x2dc/0x4e0 [btrfs]
btrfs_balance+0x3b4/0x730 [btrfs]
btrfs_ioctl_balance+0x12c/0x300 [btrfs]
btrfs_ioctl+0xf90/0x1380 [btrfs]
__arm64_sys_ioctl+0xb4/0x100
invoke_syscall+0x74/0x100
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x54/0x1c0
el0t_64_sync_handler+0x120/0x130
el0t_64_sync+0x194/0x198
I can hit that somewhat reliably (around 2/3)
The code is modified btrfs code
(https://github.com/adam900710/linux/tree/larger_meta_folio), which does
something like:
- Allocate an order 2 folio using GFP_NOFS | __GFP_NOFAIL
- Attach that order 2 folio to filemap using GFP_NOFS | __GFP_NOFAIL
With extra handling for EEXIST.
Meanwhile for the original btrfs code, the only difference is in the
folio order (order 2 vs 0).
Considering the gfp flag is the same and only order is different, I'm
wondering if it's memory cgroup doing something weird, or it's not the
correct way to add a higher order folio to page cache?
Thanks,
Qu