On 10/05/23 23:59, riel@xxxxxxxxxxx wrote: > From: Rik van Riel <riel@xxxxxxxxxxx> > > Malloc libraries, like jemalloc and tcalloc, take decisions on when > to call madvise independently from the code in the main application. > > This sometimes results in the application page faulting on an address, > right after the malloc library has shot down the backing memory with > MADV_DONTNEED. > > Usually this is harmless, because we always have some 4kB pages > sitting around to satisfy a page fault. However, with hugetlbfs > systems often allocate only the exact number of huge pages that > the application wants. > > Due to TLB batching, hugetlbfs MADV_DONTNEED will free pages outside of > any lock taken on the page fault path, which can open up the following > race condition: > > CPU 1 CPU 2 > > MADV_DONTNEED > unmap page > shoot down TLB entry > page fault > fail to allocate a huge page > killed with SIGBUS > free page > > Fix that race by pulling the locking from __unmap_hugepage_final_range > into helper functions called from zap_page_range_single. This ensures > page faults stay locked out of the MADV_DONTNEED VMA until the > huge pages have actually been freed. > > Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx> > Cc: stable@xxxxxxxxxx > Fixes: 04ada095dcfc ("hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing") > --- > include/linux/hugetlb.h | 35 +++++++++++++++++++++++++++++++++-- > mm/hugetlb.c | 34 ++++++++++++++++++++++------------ > mm/memory.c | 13 ++++++++----- > 3 files changed, 63 insertions(+), 19 deletions(-) Thanks for all the revisions, Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> -- Mike Kravetz