On Thu, Aug 20, 2020 at 11:03:41AM +0200, Michal Hocko wrote: > From 73a40589cab12122170fb9f90222982e81d41423 Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@xxxxxxxx> > Date: Thu, 20 Aug 2020 10:44:58 +0200 > Subject: [PATCH] memcg: fix use-after-free in uncharge_batch > > syzbot has reported an use-after-free in the uncharge_batch path > BUG: KASAN: use-after-free in instrument_atomic_write include/linux/instrumented.h:71 [inline] > BUG: KASAN: use-after-free in atomic64_sub_return include/asm-generic/atomic-instrumented.h:970 [inline] > BUG: KASAN: use-after-free in atomic_long_sub_return include/asm-generic/atomic-long.h:113 [inline] > BUG: KASAN: use-after-free in page_counter_cancel mm/page_counter.c:54 [inline] > BUG: KASAN: use-after-free in page_counter_uncharge+0x3d/0xc0 mm/page_counter.c:155 > Write of size 8 at addr ffff8880371c0148 by task syz-executor.0/9304 > > CPU: 0 PID: 9304 Comm: syz-executor.0 Not tainted 5.8.0-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0x1f0/0x31e lib/dump_stack.c:118 > print_address_description+0x66/0x620 mm/kasan/report.c:383 > __kasan_report mm/kasan/report.c:513 [inline] > kasan_report+0x132/0x1d0 mm/kasan/report.c:530 > check_memory_region_inline mm/kasan/generic.c:183 [inline] > check_memory_region+0x2b5/0x2f0 mm/kasan/generic.c:192 > instrument_atomic_write include/linux/instrumented.h:71 [inline] > atomic64_sub_return include/asm-generic/atomic-instrumented.h:970 [inline] > atomic_long_sub_return include/asm-generic/atomic-long.h:113 [inline] > page_counter_cancel mm/page_counter.c:54 [inline] > page_counter_uncharge+0x3d/0xc0 mm/page_counter.c:155 > uncharge_batch+0x6c/0x350 mm/memcontrol.c:6764 > uncharge_page+0x115/0x430 mm/memcontrol.c:6796 > uncharge_list mm/memcontrol.c:6835 [inline] > mem_cgroup_uncharge_list+0x70/0xe0 mm/memcontrol.c:6877 > release_pages+0x13a2/0x1550 mm/swap.c:911 > tlb_batch_pages_flush mm/mmu_gather.c:49 [inline] > tlb_flush_mmu_free mm/mmu_gather.c:242 [inline] > tlb_flush_mmu+0x780/0x910 mm/mmu_gather.c:249 > tlb_finish_mmu+0xcb/0x200 mm/mmu_gather.c:328 > exit_mmap+0x296/0x550 mm/mmap.c:3185 > __mmput+0x113/0x370 kernel/fork.c:1076 > exit_mm+0x4cd/0x550 kernel/exit.c:483 > do_exit+0x576/0x1f20 kernel/exit.c:793 > do_group_exit+0x161/0x2d0 kernel/exit.c:903 > get_signal+0x139b/0x1d30 kernel/signal.c:2743 > arch_do_signal+0x33/0x610 arch/x86/kernel/signal.c:811 > exit_to_user_mode_loop kernel/entry/common.c:135 [inline] > exit_to_user_mode_prepare+0x8d/0x1b0 kernel/entry/common.c:166 > syscall_exit_to_user_mode+0x5e/0x1a0 kernel/entry/common.c:241 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > 1a3e1f40962c ("mm: memcontrol: decouple reference counting from page > accounting") has reworked the memcg lifetime to be bound the the struct > page rather than charges. It has also removed the css_put_many from > uncharge_batch and that is causing the above splat. uncharge_batch is > supposed to uncharge accumulated charges for all pages freed from the > same memcg. The queuing is done by uncharge_page which however drops the > memcg reference after it adds charges to the batch. If the current page > happens to be the last one holding the reference for its memcg then the > memcg is OK to go and the next page to be freed will trigger batched > uncharge which needs to access the memcg which is gone already. > > Fix the issue by taking a reference for the memcg in the current batch. > > Fixes: 1a3e1f40962c ("mm: memcontrol: decouple reference counting from page accounting") > Reported-by: syzbot+b305848212deec86eabe@xxxxxxxxxxxxxxxxxxxxxxxxx > Reported-by: syzbot+b5ea6fb6f139c8b9482b@xxxxxxxxxxxxxxxxxxxxxxxxx > Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> Nice catch! The fix looks correct - ug now holds a reference count for its ug->memcg pointer. Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>