On 2024/7/16 1:19, Alexander Potapenko wrote: > On Fri, Jul 12, 2024 at 4:08 AM mawupeng <mawupeng1@xxxxxxxxxx> wrote: >> >> Hi maintainers, >> >> kingly ping. >> >> On 2024/6/18 14:40, Wupeng Ma wrote: >>> Hi maintainers, >>> >>> During our testing, we discovered that kasan vmalloc may trigger a false >>> vmalloc-out-of-bounds warning due to a race between kasan_populate_vmalloc_pte >>> and kasan_depopulate_vmalloc_pte. >>> >>> cpu0 cpu1 cpu2 >>> kasan_populate_vmalloc_pte kasan_populate_vmalloc_pte kasan_depopulate_vmalloc_pte >>> spin_unlock(&init_mm.page_table_lock); >>> pte_none(ptep_get(ptep)) >>> // pte is valid here, return here >>> pte_clear(&init_mm, addr, ptep); >>> pte_none(ptep_get(ptep)) >>> // pte is none here try alloc new pages >>> spin_lock(&init_mm.page_table_lock); >>> kasan_poison >>> // memset kasan shadow region to 0 >>> page = __get_free_page(GFP_KERNEL); >>> __memset((void *)page, KASAN_VMALLOC_INVALID, PAGE_SIZE); >>> pte = pfn_pte(PFN_DOWN(__pa(page)), PAGE_KERNEL); >>> spin_lock(&init_mm.page_table_lock); >>> set_pte_at(&init_mm, addr, ptep, pte); >>> spin_unlock(&init_mm.page_table_lock); >>> >>> >>> Since kasan shadow memory in cpu0 is set to 0xf0 which means it is not >>> initialized after the race in cpu1. Consequently, a false vmalloc-out-of-bounds >>> warning is triggered when a user attempts to access this memory region. >>> >>> The root cause of this problem is the pte valid check at the start of >>> kasan_populate_vmalloc_pte should be removed since it is not protected by >>> page_table_lock. However, this may result in severe performance degradation >>> since pages will be frequently allocated and freed. >>> >>> Is there have any thoughts on how to solve this issue? >>> >>> Thank you. > > I am going to take a closer look at this issue. Any chance you have a > reproducer for it? So far not good. I am trying to get a reproducer, but there is little progress in it. >