Re: [PATCH v2 5/6] mm/list_lru: split the lock to per-cgroup scope

Usama Arif <usamaarif642@xxxxxxxxx> · Mon, 28 Oct 2024 13:22:25 +0000

On 27/10/2024 17:26, Kairui Song wrote:
> Hi Usama,
> 
>>
>> Hi Kairui,
>>
>> I was testing zswap writeback in mm-unstable, and I think this patch might be breaking things.
>>
>> I have added the panic below
>>
>>   130.051024] ------------[ cut here ]------------
>> [  130.051489] kernel BUG at mm/list_lru.c:321!
>> [  130.051732] Oops: invalid opcode: 0000 [#1] SMP
>> [  130.052133] CPU: 1 UID: 0 PID: 4976 Comm: cc1 Not tainted 6.12.0-rc1-00084-g278bd01cdaf1 #276
>> [  130.052595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.el9 04/01/2014
>> [  130.053276] RIP: 0010:__list_lru_walk_one+0x1ae/0x1b0
>> [  130.053983] Code: 7c 24 78 00 74 03 fb eb 00 48 89 d8 48 83 c4 40 5b 41 5c 41 5d 41 5e 41 5f 5d c3 41 c6 07 00 eb e8 41 c6 07 00 fb eb e1 0f 0b <0f> 0b 0f 1f 44 00 00 6a 01 e8 44 fe ff ff 48 83 c4 08 c3 66 2e 0f
>> [  130.055557] RSP: 0000:ffffc90004a2b9a0 EFLAGS: 00010246
>> [  130.056084] RAX: ffff88805dedf6e8 RBX: 0000000000000071 RCX: 0000000000000005
>> [  130.057407] RDX: 0000000000000000 RSI: 0000000000000022 RDI: ffff888008a26400
>> [  130.057794] RBP: ffff88805dedf6d0 R08: 0000000000000402 R09: 0000000000000001
>> [  130.058579] R10: ffffc90004a2b7e8 R11: 0000000000000000 R12: ffffffff81342930
>> [  130.058962] R13: ffff888017532ca0 R14: ffffc90004a2bae8 R15: ffff8880175322c8
>> [  130.059773] FS:  00007ff3f1e21f00(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
>> [  130.060242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  130.060563] CR2: 00007f428e2e2ed8 CR3: 0000000067db6001 CR4: 0000000000770ef0
>> [  130.060952] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  130.061658] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [  130.062425] PKRU: 55555554
>> [  130.062578] Call Trace:
>> [  130.062720]  <TASK>
>> [  130.062941]  ? __die_body+0x66/0xb0
>> [  130.063145]  ? die+0x88/0xb0
>> [  130.063309]  ? do_trap+0x9d/0x170
>> [  130.063499]  ? __list_lru_walk_one+0x1ae/0x1b0
>> [  130.063745]  ? __list_lru_walk_one+0x1ae/0x1b0
>> [  130.063995]  ? handle_invalid_op+0x65/0x80
>> [  130.064223]  ? __list_lru_walk_one+0x1ae/0x1b0
>> [  130.064467]  ? exc_invalid_op+0x2f/0x40
>> [  130.064681]  ? asm_exc_invalid_op+0x16/0x20
>> [  130.064912]  ? zswap_shrinker_count+0x1c0/0x1c0
>> [  130.065172]  ? __list_lru_walk_one+0x1ae/0x1b0
>> [  130.065417]  list_lru_walk_one+0xc/0x20
>> [  130.065630]  zswap_shrinker_scan+0x4b/0x80
>> [  130.065856]  do_shrink_slab+0x15f/0x2f0
>> [  130.066075]  shrink_slab+0x2bf/0x3d0
>> [  130.066276]  shrink_node+0x4f0/0x8a0
>> [  130.066477]  do_try_to_free_pages+0x131/0x4d0
>> [  130.066717]  try_to_free_mem_cgroup_pages+0x143/0x220
>> [  130.067000]  try_charge_memcg+0x22a/0x610
>> [  130.067224]  __mem_cgroup_charge+0x74/0x100
>> [  130.068060]  do_pte_missing+0xaa8/0x1020
>> [  130.068280]  handle_mm_fault+0x75d/0x1120
>> [  130.068502]  do_user_addr_fault+0x1c2/0x6f0
>> [  130.068802]  exc_page_fault+0x4f/0xb0
>> [  130.069014]  asm_exc_page_fault+0x22/0x30
>> [  130.069240] RIP: 0033:0x7ff3f19ede49
>> [  130.069441] Code: c9 62 e1 7f 29 7f 00 c3 66 0f 1f 84 00 00 00 00 00 40 0f b6 c6 48 89 d1 48 89 fa f3 aa 48 89 d0 c3 48 3b 15 c9 a3 06 00 77 e7 <62> e1 fe 28 7f 07 62 e1 fe 28 7f 47 01 48 81 fa 80 00 00 00 76 89
>> [  130.070477] RSP: 002b:00007ffc5c818078 EFLAGS: 00010283
>> [  130.070830] RAX: 00007ff3efac9000 RBX: 00007ff3f02d1940 RCX: 0000000000000001
>> [  130.071522] RDX: 00000000000005a8 RSI: 0000000000000000 RDI: 00007ff3efac9000
>> [  130.072146] RBP: 00007ffc5c8180c0 R08: 0000000003007320 R09: 0000000000000007
>> [  130.072594] R10: 0000000003007320 R11: 0000000000000012 R12: 00007ff3f1f0e000
>> [  130.072981] R13: 000000007ffa1e74 R14: 00000000000005a8 R15: 00000000000000b5
>> [  130.073369]  </TASK>
>> [  130.073496] Modules linked in:
>> [  130.073701] ---[ end trace 0000000000000000 ]---
>> [  130.073960] RIP: 0010:__list_lru_walk_one+0x1ae/0x1b0
>> [  130.074319] Code: 7c 24 78 00 74 03 fb eb 00 48 89 d8 48 83 c4 40 5b 41 5c 41 5d 41 5e 41 5f 5d c3 41 c6 07 00 eb e8 41 c6 07 00 fb eb e1 0f 0b <0f> 0b 0f 1f 44 00 00 6a 01 e8 44 fe ff ff 48 83 c4 08 c3 66 2e 0f
>> [  130.075564] RSP: 0000:ffffc90004a2b9a0 EFLAGS: 00010246
>> [  130.075897] RAX: ffff88805dedf6e8 RBX: 0000000000000071 RCX: 0000000000000005
>> [  130.076342] RDX: 0000000000000000 RSI: 0000000000000022 RDI: ffff888008a26400
>> [  130.076739] RBP: ffff88805dedf6d0 R08: 0000000000000402 R09: 0000000000000001
>> [  130.077192] R10: ffffc90004a2b7e8 R11: 0000000000000000 R12: ffffffff81342930
>> [  130.077739] R13: ffff888017532ca0 R14: ffffc90004a2bae8 R15: ffff8880175322c8
>> [  130.078149] FS:  00007ff3f1e21f00(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
>> [  130.078764] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  130.079095] CR2: 00007f428e2e2ed8 CR3: 0000000067db6001 CR4: 0000000000770ef0
>> [  130.079521] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  130.080009] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [  130.080402] PKRU: 55555554
>> [  130.080713] Kernel panic - not syncing: Fatal exception
>> [  130.081198] Kernel Offset: disabled
>> [  130.081396] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>
>> Thanks,
>> Usama
>>
> 
> Thanks for the report. I converted list_lru_walk callback to keep the
> list unlocked when LRU_RETRY and LRU_REMOVED_RETRY is returned, but
> didn't notice shrink_memcg_cg in zswap.c could return LRU_STOP after
> it unlocked the list.
> 
> The fix should be simple, is it easy to reproduce? Can you help verify?
> 
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index 79c2d21504a2..1a3caf4c4e14 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -298,9 +298,9 @@ __list_lru_walk_one(struct list_lru *lru, int nid,
> struct mem_cgroup *memcg,
>                 ret = isolate(item, l, cb_arg);
>                 switch (ret) {
>                 /*
> -                * LRU_RETRY and LRU_REMOVED_RETRY will drop the lru lock,
> -                * the list traversal will be invalid and have to restart from
> -                * scratch.
> +                * LRU_RETRY, LRU_REMOVED_RETRY and LRU_STOP will drop the lru
> +                * lock, the list traversal will be invalid and have to restart
> +                * from scratch.
>                  */
>                 case LRU_RETRY:
>                         goto restart;
> @@ -318,14 +318,13 @@ __list_lru_walk_one(struct list_lru *lru, int
> nid, struct mem_cgroup *memcg,
>                 case LRU_SKIP:
>                         break;
>                 case LRU_STOP:
> -                       assert_spin_locked(&l->lock);
>                         goto out;
>                 default:
>                         BUG();
>                 }
>         }
> -out:
>         unlock_list_lru(l, irq_off);
> +out:
>         return isolated;
>  }

Hi Kairui,

With this fix there are no more crashes. Thanks for the quick fix.

Just FYI, to test it, just enable zswap and zswap shrinker
(echo Y > /sys/module/zswap/parameters/shrinker_enabled)
and build the kernel in a memory constrained environment
(memory.max 1G).

Thanks,
Usama