Soft lockup for cgroup charge?

Qu Wenruo <wqu@xxxxxxxx> · Wed, 3 Jul 2024 20:03:04 +0930

Hi,

Recently I'm hitting the following soft lockup related to cgroup charge 
on aarch64:

 watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [btrfs:698546]
 Modules linked in: dm_log_writes dm_flakey nls_ascii nls_cp437 vfat 
crct10dif_ce polyval_ce polyval_generic ghash_ce rtc_efi fat processor 
btrfs xor xor_neon raid6_pq zstd_compress fuse loop nfnetlink 
qemu_fw_cfg ext4 mbcache jbd2 dm_mod xhci_pci virtio_net 
xhci_pci_renesas net_failover xhci_hcd virtio_balloon virtio_scsi 
failover dimlib virtio_blk virtio_console virtio_mmio
 irq event stamp: 47291484
 hardirqs last  enabled at (47291483): [<ffffabe6d1a5d294>] 
try_charge_memcg+0x3ac/0x780
 hardirqs last disabled at (47291484): [<ffffabe6d2401244>] 
el1_interrupt+0x24/0x80
 softirqs last  enabled at (47282714): [<ffffabe6d168e7a4>] 
handle_softirqs+0x2bc/0x310
 softirqs last disabled at (47282709): [<ffffabe6d16301e4>] 
__do_softirq+0x1c/0x28
 CPU: 3 PID: 698546 Comm: btrfs Not tainted 6.10.0-rc6-custom+ #34
 Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
 pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : try_charge_memcg+0x154/0x780
 lr : try_charge_memcg+0x3ac/0x780
 sp : ffff800089b83430
 x29: ffff800089b834a0 x28: 0000000000000002 x27: ffffabe6d2b515e8
 x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000008c40
 x23: ffffabe6d2b515e8 x22: 0000000000000000 x21: 0000000000000040
 x20: ffff4854c6b32000 x19: 0000000000000004 x18: 0000000000000000
 x17: 0000000000000000 x16: ffffabe6d19474a8 x15: ffff4854d24b6f88
 x14: 0000000000000000 x13: 0000000000000000 x12: ffff4854ff1cdfd0
 x11: ffffabe6d4330370 x10: ffffabe6d46442ec x9 : ffffabe6d2b3f6e4
 x8 : ffff800089b83340 x7 : ffff800089b84000 x6 : ffff800089b80000
 x5 : 0000000000000000 x4 : 0000000000000006 x3 : 000000ffffffffff
 x2 : 0000000000000001 x1 : ffffabe6d2b3f6e0 x0 : 0000000002d19c5b
 Call trace:
  try_charge_memcg+0x154/0x780
  __mem_cgroup_charge+0x5c/0xc0
  filemap_add_folio+0x5c/0x118
  attach_eb_folio_to_filemap+0x84/0x4e0 [btrfs]
  alloc_extent_buffer+0x1d4/0x730 [btrfs]
  btrfs_find_create_tree_block+0x20/0x48 [btrfs]
  btrfs_readahead_tree_block+0x4c/0xd8 [btrfs]
  relocate_tree_blocks+0x1d8/0x3a0 [btrfs]
  relocate_block_group+0x37c/0x508 [btrfs]
  btrfs_relocate_block_group+0x274/0x458 [btrfs]
  btrfs_relocate_chunk+0x54/0x1b8 [btrfs]
  __btrfs_balance+0x2dc/0x4e0 [btrfs]
  btrfs_balance+0x3b4/0x730 [btrfs]
  btrfs_ioctl_balance+0x12c/0x300 [btrfs]
  btrfs_ioctl+0xf90/0x1380 [btrfs]
  __arm64_sys_ioctl+0xb4/0x100
  invoke_syscall+0x74/0x100
  el0_svc_common.constprop.0+0x48/0xf0
  do_el0_svc+0x24/0x38
  el0_svc+0x54/0x1c0
  el0t_64_sync_handler+0x120/0x130
  el0t_64_sync+0x194/0x198

I can hit that somewhat reliably (around 2/3)

The code is modified btrfs code 
(https://github.com/adam900710/linux/tree/larger_meta_folio), which does 
something like:

- Allocate an order 2 folio using GFP_NOFS | __GFP_NOFAIL
- Attach that order 2 folio to filemap using GFP_NOFS | __GFP_NOFAIL
  With extra handling for EEXIST.

Meanwhile for the original btrfs code, the only difference is in the 
folio order (order 2 vs 0).

Considering the gfp flag is the same and only order is different, I'm 
wondering if it's memory cgroup doing something weird, or it's not the 
correct way to add a higher order folio to page cache?

Thanks,
Qu