On Mon, Oct 14, 2024 at 10:17 AM Jeongjun Park <aha310510@xxxxxxxxx> wrote: > > Kairui Song <ryncsn@xxxxxxxxx> wrote: > > > > On Mon, Oct 7, 2024 at 3:06 PM Jeongjun Park <aha310510@xxxxxxxxx> wrote: > >> > >> A report [1] was uploaded from syzbot. > >> > >> In the previous commit 862590ac3708 ("mm: swap: allow cache reclaim to skip > >> slot cache"), the __try_to_reclaim_swap() function reads offset and folio->entry > >> from folio without folio_lock protection. > >> > >> In the currently reported KCSAN log, it is assumed that the actual data-race > >> will not occur because the calltrace that does WRITE already obtains the > >> folio_lock and then writes. > >> > >> However, the existing __try_to_reclaim_swap() function was already implemented > >> to perform reads under folio_lock protection [1], and there is a risk of a > >> data-race occurring through a function other than the one shown in the KCSAN > >> log. > >> > >> Therefore, I think it is appropriate to change read operations for > >> folio to be performed under folio_lock. > >> > >> [1] > >> > >> ================================================================== > >> BUG: KCSAN: data-race in __delete_from_swap_cache / __try_to_reclaim_swap > >> > >> write to 0xffffea0004c90328 of 8 bytes by task 5186 on cpu 0: > >> __delete_from_swap_cache+0x1f0/0x290 mm/swap_state.c:163 > >> delete_from_swap_cache+0x72/0xe0 mm/swap_state.c:243 > >> folio_free_swap+0x1d8/0x1f0 mm/swapfile.c:1850 > >> free_swap_cache mm/swap_state.c:293 [inline] > >> free_pages_and_swap_cache+0x1fc/0x410 mm/swap_state.c:325 > >> __tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline] > >> tlb_batch_pages_flush mm/mmu_gather.c:149 [inline] > >> tlb_flush_mmu_free mm/mmu_gather.c:366 [inline] > >> tlb_flush_mmu+0x2cf/0x440 mm/mmu_gather.c:373 > >> zap_pte_range mm/memory.c:1700 [inline] > >> zap_pmd_range mm/memory.c:1739 [inline] > >> zap_pud_range mm/memory.c:1768 [inline] > >> zap_p4d_range mm/memory.c:1789 [inline] > >> unmap_page_range+0x1f3c/0x22d0 mm/memory.c:1810 > >> unmap_single_vma+0x142/0x1d0 mm/memory.c:1856 > >> unmap_vmas+0x18d/0x2b0 mm/memory.c:1900 > >> exit_mmap+0x18a/0x690 mm/mmap.c:1864 > >> __mmput+0x28/0x1b0 kernel/fork.c:1347 > >> mmput+0x4c/0x60 kernel/fork.c:1369 > >> exit_mm+0xe4/0x190 kernel/exit.c:571 > >> do_exit+0x55e/0x17f0 kernel/exit.c:926 > >> do_group_exit+0x102/0x150 kernel/exit.c:1088 > >> get_signal+0xf2a/0x1070 kernel/signal.c:2917 > >> arch_do_signal_or_restart+0x95/0x4b0 arch/x86/kernel/signal.c:337 > >> exit_to_user_mode_loop kernel/entry/common.c:111 [inline] > >> exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline] > >> __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] > >> syscall_exit_to_user_mode+0x59/0x130 kernel/entry/common.c:218 > >> do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89 > >> entry_SYSCALL_64_after_hwframe+0x77/0x7f > >> > >> read to 0xffffea0004c90328 of 8 bytes by task 5189 on cpu 1: > >> __try_to_reclaim_swap+0x9d/0x510 mm/swapfile.c:198 > >> free_swap_and_cache_nr+0x45d/0x8a0 mm/swapfile.c:1915 > >> zap_pte_range mm/memory.c:1656 [inline] > >> zap_pmd_range mm/memory.c:1739 [inline] > >> zap_pud_range mm/memory.c:1768 [inline] > >> zap_p4d_range mm/memory.c:1789 [inline] > >> unmap_page_range+0xcf8/0x22d0 mm/memory.c:1810 > >> unmap_single_vma+0x142/0x1d0 mm/memory.c:1856 > >> unmap_vmas+0x18d/0x2b0 mm/memory.c:1900 > >> exit_mmap+0x18a/0x690 mm/mmap.c:1864 > >> __mmput+0x28/0x1b0 kernel/fork.c:1347 > >> mmput+0x4c/0x60 kernel/fork.c:1369 > >> exit_mm+0xe4/0x190 kernel/exit.c:571 > >> do_exit+0x55e/0x17f0 kernel/exit.c:926 > >> __do_sys_exit kernel/exit.c:1055 [inline] > >> __se_sys_exit kernel/exit.c:1053 [inline] > >> __x64_sys_exit+0x1f/0x20 kernel/exit.c:1053 > >> x64_sys_call+0x2d46/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:61 > >> do_syscall_x64 arch/x86/entry/common.c:52 [inline] > >> do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83 > >> entry_SYSCALL_64_after_hwframe+0x77/0x7f > >> > >> value changed: 0x0000000000000242 -> 0x0000000000000000 > >> > >> Reported-by: syzbot+fa43f1b63e3aa6f66329@xxxxxxxxxxxxxxxxxxxxxxxxx > >> Fixes: 862590ac3708 ("mm: swap: allow cache reclaim to skip slot cache") > >> Signed-off-by: Jeongjun Park <aha310510@xxxxxxxxx> > >> --- > >> mm/swapfile.c | 7 ++++--- > >> 1 file changed, 4 insertions(+), 3 deletions(-) > >> > >> diff --git a/mm/swapfile.c b/mm/swapfile.c > >> index 0cded32414a1..eb782fcd5627 100644 > >> --- a/mm/swapfile.c > >> +++ b/mm/swapfile.c > >> @@ -194,9 +194,6 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si, > >> if (IS_ERR(folio)) > >> return 0; > >> > >> - /* offset could point to the middle of a large folio */ > >> - entry = folio->swap; > >> - offset = swp_offset(entry); > >> nr_pages = folio_nr_pages(folio); > >> ret = -nr_pages; > >> > >> @@ -210,6 +207,10 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si, > >> if (!folio_trylock(folio)) > >> goto out; > >> > >> + /* offset could point to the middle of a large folio */ > >> + entry = folio->swap; > >> + offset = swp_offset(entry); > >> + > >> need_reclaim = ((flags & TTRS_ANYWAY) || > >> ((flags & TTRS_UNMAPPED) && !folio_mapped(folio)) || > >> ((flags & TTRS_FULL) && mem_cgroup_swap_full(folio))); > >> -- > > > > Reviewed-by: Kairui Song <kasong@xxxxxxxxxxx> > > > > Hi Andrew, > > > > Will this be added to stable and 6.12? 862590ac3708 is already in 6.12 > > and this fixes a potential issue of it. > > As far as I can see, commit 862590ac3708 was applied starting > from 6.12-rc1, so it looks like no additional commits are needed > for the stable version. Hi, sorry for the confusion, I meant mm-stable, not the stable branch. It's better to merge this in 6.12. > Regards, > > Jeongjun Park