Re: 6.12/BUG: KASAN: slab-use-after-free in m_next at fs/proc/task_mmu.c:187

Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> · Wed, 2 Oct 2024 18:55:59 +0100

Thanks for your report!

On Wed, Oct 02, 2024 at 10:34:32PM GMT, Mikhail Gavrilov wrote:
> On Wed, Sep 25, 2024 at 3:28 AM Mikhail Gavrilov
> <mikhail.v.gavrilov@xxxxxxxxx> wrote:
> >
> > Hi,
> > I am testing kernel snapshots on Fedora Rawhide and Today with build
> > on commit de5cb0dcb74c I saw for the first time "KASAN:
> > slab-use-after-free in m_next+0x13b".
> > Unfortunately it is not clear what triggered this problem because it
> > happened after 21 hour uptime.
> >
> > Full trace looks like:
> > input: Noble FoKus Mystique (AVRCP) as /devices/virtual/input/input26
> > ==================================================================
> > BUG: KASAN: slab-use-after-free in m_next+0x13b/0x170
> > Read of size 8 at addr ffff8885609b40f0 by task htop/3847
> >
> > CPU: 14 UID: 1000 PID: 3847 Comm: htop Tainted: G        W    L
> > -------  ---  6.12.0-0.rc0.20240923gitde5cb0dcb74c.9.fc42.x86_64+debug
> > #1
> > Tainted: [W]=WARN, [L]=SOFTLOCKUP
> > Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING WIFI,
> > BIOS 3040 09/12/2024
> > Call Trace:
> >  <TASK>
> >  dump_stack_lvl+0x84/0xd0
> >  ? m_next+0x13b/0x170
> >  print_report+0x174/0x505
> >  ? m_next+0x13b/0x170
> >  ? __virt_addr_valid+0x231/0x420
> >  ? m_next+0x13b/0x170
> >  kasan_report+0xab/0x180
> >  ? m_next+0x13b/0x170
> >  m_next+0x13b/0x170
> >  seq_read_iter+0x8e5/0x1130
> >  seq_read+0x2b4/0x3c0
> >  ? __pfx_seq_read+0x10/0x10
> >  ? inode_security+0x54/0xf0
> >  ? rw_verify_area+0x3b2/0x5e0
> >  vfs_read+0x165/0xa20
> >  ? __pfx_vfs_read+0x10/0x10
> >  ? ktime_get_coarse_real_ts64+0x41/0xd0
> >  ? local_clock_noinstr+0xd/0x100
> >  ? __pfx_lock_release+0x10/0x10
> >  ksys_read+0xfb/0x1d0
> >  ? __pfx_ksys_read+0x10/0x10
> >  ? ktime_get_coarse_real_ts64+0x41/0xd0
> >  do_syscall_64+0x97/0x190
> >  ? __lock_acquire+0xdcd/0x62c0
> >  ? __pfx___lock_acquire+0x10/0x10
> >  ? __pfx___lock_acquire+0x10/0x10
> >  ? __pfx___lock_acquire+0x10/0x10
> >  ? audit_filter_inodes.part.0+0x12d/0x220
> >  ? local_clock_noinstr+0xd/0x100
> >  ? __pfx_lock_release+0x10/0x10
> >  ? rcu_is_watching+0x12/0xc0
> >  ? kfree+0x27c/0x4d0
> >  ? audit_reset_context+0x8c5/0xee0
> >  ? lockdep_hardirqs_on_prepare+0x171/0x400
> >  ? do_syscall_64+0xa3/0x190
> >  ? lockdep_hardirqs_on+0x7c/0x100
> >  ? do_syscall_64+0xa3/0x190
> >  ? do_syscall_64+0xa3/0x190
> >  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > RIP: 0033:0x7f4190dcac36
> > Code: 89 df e8 2d c1 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 75 15
> > 83 e2 39 83 fa 08 75 0d e8 32 ff ff ff 66 90 48 8b 45 10 0f 05 <48> 8b
> > 5d f8 c9 c3 0f 1f 40 00 f3 0f 1e fa 55 48 89 e5 48 83 ec 08
> > RSP: 002b:00007ffcde82b690 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
> > RAX: ffffffffffffffda RBX: 00007f4190ce3740 RCX: 00007f4190dcac36
> > RDX: 0000000000000400 RSI: 000055bf5e823a20 RDI: 0000000000000005
> > RBP: 00007ffcde82b6a0 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000202 R12: 00007f4190f44fd0
> > R13: 00007f4190f44e80 R14: 000055bf5e823e20 R15: 000055bf5ecc9160
> >  </TASK>
> >
> > Allocated by task 176289:
> >  kasan_save_stack+0x30/0x50
> >  kasan_save_track+0x14/0x30
> >  __kasan_slab_alloc+0x6e/0x70
> >  kmem_cache_alloc_noprof+0x15a/0x3d0
> >  vm_area_dup+0x23/0x190
> >  __split_vma+0x137/0xd40
> >  vms_gather_munmap_vmas+0x29d/0xfc0
> >  mmap_region+0x35a/0x1f50
> >  do_mmap+0x8e7/0x1020
> >  vm_mmap_pgoff+0x178/0x2f0
> >  __do_fast_syscall_32+0x86/0x110
> >  do_fast_syscall_32+0x32/0x80
> >  sysret32_from_system_call+0x0/0x4a
> >
> > Freed by task 0:
> >  kasan_save_stack+0x30/0x50
> >  kasan_save_track+0x14/0x30
> >  kasan_save_free_info+0x3b/0x70
> >  __kasan_slab_free+0x37/0x50
> >  kmem_cache_free+0x1a7/0x5a0
> >  rcu_do_batch+0x3fd/0x1120
> >  rcu_core+0x636/0x9b0
> >  handle_softirqs+0x1e9/0x8d0
> >  __irq_exit_rcu+0xbb/0x1c0
> >  irq_exit_rcu+0xe/0x30
> >  sysvec_apic_timer_interrupt+0xa1/0xd0
> >  asm_sysvec_apic_timer_interrupt+0x1a/0x20
> >
> > Last potentially related work creation:
> >  kasan_save_stack+0x30/0x50
> >  __kasan_record_aux_stack+0x8e/0xa0
> >  __call_rcu_common.constprop.0+0xf4/0x10d0
> >  vma_complete+0x720/0x10b0
> >  commit_merge+0x42a/0x1310
> >  vma_expand+0x313/0xad0
> >  vma_merge_new_range+0x2cd/0xec0
> >  mmap_region+0x432/0x1f50
> >  do_mmap+0x8e7/0x1020
> >  vm_mmap_pgoff+0x178/0x2f0
> >  __do_fast_syscall_32+0x86/0x110
> >  do_fast_syscall_32+0x32/0x80
> >  sysret32_from_system_call+0x0/0x4a
> >
> > The buggy address belongs to the object at ffff8885609b40f0
> >  which belongs to the cache vm_area_struct of size 176
> > The buggy address is located 0 bytes inside of
> >  freed 176-byte region [ffff8885609b40f0, ffff8885609b41a0)
> >
> > The buggy address belongs to the physical page:
> > page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5609b4
> > head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> > memcg:ffff88814d36d001
> > flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
> > page_type: f5(slab)
> > raw: 0017ffffc0000040 ffff888108113d40 dead000000000100 dead000000000122
> > raw: 0000000000000000 0000000000220022 00000001f5000000 ffff88814d36d001
> > head: 0017ffffc0000040 ffff888108113d40 dead000000000100 dead000000000122
> > head: 0000000000000000 0000000000220022 00000001f5000000 ffff88814d36d001
> > head: 0017ffffc0000001 ffffea0015826d01 ffffffffffffffff 0000000000000000
> > head: 0000000000000002 0000000000000000 00000000ffffffff 0000000000000000
> > page dumped because: kasan: bad access detected
> >
> > Memory state around the buggy address:
> >  ffff8885609b3f80: 00 00 00 00 00 00 00 00 00 00 00 00task_mmu 00 00 00 00
> >  ffff8885609b4000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >ffff8885609b4080: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fa fb
> >                                                              ^
> >  ffff8885609b4100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >  ffff8885609b4180: fb fb fb fb fc fc fc fc fc fc fc fc 00 00 00 00
> > ==================================================================
> > Disabling lock debugging due to kernel taint
> >
> > > sh /usr/src/kernels/(uname -r)/scripts/faddr2line /lib/debug/lib/modules/(uname -r)/vmlinux m_next+0x13b
> > m_next+0x13b/0x170:
> > proc_get_vma at fs/proc/task_mmu.c:136
> > (inlined by) m_next at fs/proc/task_mmu.c:187
> >
> > > cat -n /usr/src/debug/kernel-6.11-8833-gde5cb0dcb74c/linux-6.12.0-0.rc0.20240923gitde5cb0dcb74c.9.fc42.x86_64/fs/proc/task_mmu.c | sed -n '182,192 p'
> >    182 {
> >    183 if (*ppos == -2UL) {
> >    184 *ppos = -1UL;
> >    185 return NULL;
> >    186 }
> >    187 return proc_get_vma(m->private, ppos);
> >    188 }
> >    189
> >    190 static void m_stop(struct seq_file *m, void *v)
> >    191 {
> >    192 struct proc_maps_private *priv = m->private;
> >
> > > git blame fs/proc/task_mmu.c -L 182,192
> > Blaming lines: 100% (11/11), done.
> > a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 182) {
> > c4c84f06285e4 (Matthew Wilcox (Oracle) 2022-09-06 19:48:57 +0000 183)
> >  if (*ppos == -2UL) {
> > c4c84f06285e4 (Matthew Wilcox (Oracle) 2022-09-06 19:48:57 +0000 184)
> >          *ppos = -1UL;
> > c4c84f06285e4 (Matthew Wilcox (Oracle) 2022-09-06 19:48:57 +0000 185)
> >          return NULL;
> > c4c84f06285e4 (Matthew Wilcox (Oracle) 2022-09-06 19:48:57 +0000 186)   }
> > c4c84f06285e4 (Matthew Wilcox (Oracle) 2022-09-06 19:48:57 +0000 187)
> >  return proc_get_vma(m->private, ppos);
> > a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 188) }
> > a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 189)
> > a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 190)
> > static void m_stop(struct seq_file *m, void *v)
> > a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 191) {
> > a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 192)
> >  struct proc_maps_private *priv = m->private;
> >
> > Hmm this line hasn't changed for two years.
> >
> > Machine spec: https://linux-hardware.org/?probe=323b76ce48
> > I attached below full kernel log and build config.
> >
> > Can anyone figure out what happened or should we wait for the second
> > manifestation of this issue?
> >
>
> Finally I spotted that this issue is caused by the Steam client.
> And usually happens after downloading game updates.
> Looks like Steam client runs some post update scripts which cause
> slab-use-after-free in m_next.

Yeah similar issue being investigated elsewhere,

See
https://lore.kernel.org/all/c63a64a9-cdee-4586-85ba-800e8e1a8054@lucifer.local/
for latest update.

This is ongoing, but also steam, also this commit and also related to steam
update doing something strange, so strange I literally can't repro locally :)
but Bert in that thread can.

We can reliably repro it with CONFIG_DEBUG_VM_MAPLE_TREE, CONFIG_DEBUG_VM, and
CONFIG_DEBUG_MAPLE_TREE set, if you set these you should see a report more
quickly (let us know if you do).

Also note that there is a critical error handling fix in

https://lore.kernel.org/linux-mm/20241002073932.13482-1-lorenzo.stoakes@xxxxxxxxxx/

Which should get hotfixed soon.

>
> Git bisect found the first bad commit:
> commit f8d112a4e657c65c888e6b8a8435ef61a66e4ab8 (HEAD)
> Author: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx>
> Date:   Fri Aug 30 00:00:54 2024 -0400
>
>     mm/mmap: avoid zeroing vma tree in mmap_region()
>
>     Instead of zeroing the vma tree and then overwriting the area, let the
>     area be overwritten and then clean up the gathered vmas using
>     vms_complete_munmap_vmas().
>
>     To ensure locking is downgraded correctly, the mm is set regardless of
>     MAP_FIXED or not (NULL vma).
>
>     If a driver is mapping over an existing vma, then clear the ptes before
>     the call_mmap() invocation.  This is done using the vms_clean_up_area()
>     helper.  If there is a close vm_ops, that must also be called to ensure
>     any cleanup is done before mapping over the area.  This also means that
>     calling open has been added to the abort of an unmap operation, for now.
>
>     Since vm_ops->open() and vm_ops->close() are not always undo each other
>     (state cleanup may exist in ->close() that is lost forever), the code
>     cannot be left in this way, but that change has been isolated to another
>     commit to make this point very obvious for traceability.
>
>     Temporarily keep track of the number of pages that will be removed and
>     reduce the charged amount.
>
>     This also drops the validate_mm() call in the vma_expand() function.  It
>     is necessary to drop the validate as it would fail since the mm map_count
>     would be incorrect during a vma expansion, prior to the cleanup from
>     vms_complete_munmap_vmas().
>
>     Clean up the error handing of the vms_gather_munmap_vmas() by calling the
>     verification within the function.
>
>     Link: https://lkml.kernel.org/r/20240830040101.822209-15-Liam.Howlett@xxxxxxxxxx
>     Signed-off-by: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx>
>     Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
>     Cc: Bert Karwatzki <spasswolf@xxxxxx>
>     Cc: Jeff Xu <jeffxu@xxxxxxxxxxxx>
>     Cc: Jiri Olsa <olsajiri@xxxxxxxxx>
>     Cc: Kees Cook <kees@xxxxxxxxxx>
>     Cc: Lorenzo Stoakes <lstoakes@xxxxxxxxx>
>     Cc: Mark Brown <broonie@xxxxxxxxxx>
>     Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
>     Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
>     Cc: Paul Moore <paul@xxxxxxxxxxxxxx>
>     Cc: Sidhartha Kumar <sidhartha.kumar@xxxxxxxxxx>
>     Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx>
>     Cc: Vlastimil Babka <vbabka@xxxxxxx>
>     Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>
>  mm/mmap.c | 57 +++++++++++++++++++++++++++------------------------------
>  mm/vma.c  | 54 ++++++++++++++++++++++++++++++++++++++++++------------
>  mm/vma.h  | 22 ++++++++++++++++------
>  3 files changed, 85 insertions(+), 48 deletions(-)
>
> --
> Best Regards,
> Mike Gavrilov.