Re: [PATCH] mm: Fix possible NULL pointer dereference in __swap_duplicate

Barry Song <21cnbao@xxxxxxxxx> · Fri, 14 Feb 2025 12:07:18 +1300

On Thu, Feb 13, 2025 at 9:52 PM gaoxu <gaoxu2@xxxxxxxxx> wrote:
>
> >
> > On Tue, Feb 11, 2025 at 7:14 PM gaoxu <gaoxu2@xxxxxxxxx> wrote:
> > >
> > > swp_swap_info() may return null; it is necessary to check the return
> > > value to avoid NULL pointer dereference. The code for other calls to
> > > swp_swap_info() includes checks, and __swap_duplicate() should also
> > > include checks.
> > >
> > > The reason why swp_swap_info() returns NULL is unclear; it may be due
> > > to CPU cache issues or DDR bit flips. The probability of this issue is
> > > very small, and the stack info we encountered is as follows：
> > > Unable to handle kernel NULL pointer dereference at virtual address
> > > 0000000000000058
> > > [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info:
> > >   ESR = 0x0000000096000005
> > >   EC = 0x25: DABT (current EL), IL = 32 bits
> > >   SET = 0, FnV = 0
> > >   EA = 0, S1PTW = 0
> > >   FSC = 0x05: level 1 translation fault Data abort info:
> > >   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> > >   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > >   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages,
> > > 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058]
> > > pgd=0000000000000000, p4d=0000000000000000,
> > > pud=0000000000000000
> > > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md ftrace
> > > buffer dump for: 0x1609e0 ...
> > > pc : swap_duplicate+0x44/0x164
> > > lr : copy_page_range+0x508/0x1e78
> > > sp : ffffffc0f2a699e0
> > > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388
> > > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073
> > > x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000
> > > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0
> > > x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001
> > > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff
> > > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006
> > > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10
> > > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000
> > > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f Call
> > > trace:
> > >  swap_duplicate+0x44/0x164
> > >  copy_page_range+0x508/0x1e78
> > >  copy_process+0x1278/0x21cc
> > >  kernel_clone+0x90/0x438
> > >  __arm64_sys_clone+0x5c/0x8c
> > >  invoke_syscall+0x58/0x110
> > >  do_el0_svc+0x8c/0xe0
> > >  el0_svc+0x38/0x9c
> > >  el0t_64_sync_handler+0x44/0xec
> > >  el0t_64_sync+0x1a8/0x1ac
> > > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end trace
> > > 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal
> > > exception
> > > SMP: stopping secondary CPUs
> > >
> > > The patch seems to only provide a workaround, but there are no more
> > > effective software solutions to handle the bit flips problem. This
> > > path will change the issue from a system crash to a process exception,
> > > thereby reducing the impact on the entire machine.
> > >
> > > Signed-off-by: gao xu <gaoxu2@xxxxxxxxx>
> >
> > Yeah this smells like a bug. A bit strange though - I have eyeballed the code, and
> > we (should have?) locked the PTE before resolving it into the swap entry format.
> > Which should have been enough to prevent the swap entry from being
> > unmapped and freed up. Which should have been enough to prevent swapoff...?
> >
> > (are you even doing concurrent swapoff?)
> No, the swapoff operation was not executed.
> >
> > Can you provide more context? What kernel version is this, what kind of
> > workload is this, any reproducer, etc.?
> kernel version is linux 6.6,  Android15 - linux6.6.30.
>
> The issues encountered by mobile users during usage.
> The system load should not be high, as there is no info related to low
> memory found in the logs.
> The probability of this issue occurring is very low and irregular.
> We cannot reproduce the problem during stress testing in the laboratory.
>
> I found someone reporting a similar issue on the web, see:
> https://lkml.indiana.edu/hypermail/linux/kernel/2406.0/02380.html
> https://forum.proxmox.com/threads/get_swap_device-bad-swap-file-entry.155581/
> https://forums.unraid.net/topic/145497-server-crashes-with-repeated-get_swap_device-bad-swap-file-entry-3ffffffffffff/

It might be a non-swap entry mistakenly passed to swap functions. I remember
fixing a similar issue in the Android Common Kernel 6.6:

https://android.googlesource.com/kernel/common/+/119351fe20bc73b71c6

where a migration entry is mistakenly passed to swap APIs.

In any case, we need to identify and fix the actual bug.

>
>
>

Thanks
Barry