On 09/21/23 15:42, Mike Kravetz wrote: > On 09/19/23 22:16, riel@xxxxxxxxxxx wrote: > > From: Rik van Riel <riel@xxxxxxxxxxx> > > > > Extend the locking scheme used to protect shared hugetlb mappings > > from truncate vs page fault races, in order to protect private > > hugetlb mappings (with resv_map) against MADV_DONTNEED. > > > > Add a read-write semaphore to the resv_map data structure, and > > use that from the hugetlb_vma_(un)lock_* functions, in preparation > > for closing the race between MADV_DONTNEED and page faults. > > > > Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx> > > --- > > include/linux/hugetlb.h | 6 ++++++ > > mm/hugetlb.c | 36 ++++++++++++++++++++++++++++++++---- > > 2 files changed, 38 insertions(+), 4 deletions(-) > > This looks straight forward. > > However, I ran just this patch through libhugetlbfs test suite and it hung on > misaligned_offset (2M: 32). > https://github.com/libhugetlbfs/libhugetlbfs/blob/master/tests/misaligned_offset.c > > Added lock/semaphore debugging to the kernel and got: > [ 38.094690] ========================= > [ 38.095517] WARNING: held lock freed! > [ 38.096350] 6.6.0-rc2-next-20230921-dirty #4 Not tainted > [ 38.097556] ------------------------- > [ 38.098439] mlock/1002 is freeing memory ffff8881eff8dc00-ffff8881eff8ddff, with a lock still held there! > [ 38.100550] ffff8881eff8dce8 (&resv_map->rw_sema){++++}-{3:3}, at: __unmap_hugepage_range_final+0x29/0x120 > [ 38.103564] 2 locks held by mlock/1002: > [ 38.104552] #0: ffff8881effa42a0 (&mm->mmap_lock){++++}-{3:3}, at: do_vmi_align_munmap+0x5c6/0x650 > [ 38.106611] #1: ffff8881eff8dce8 (&resv_map->rw_sema){++++}-{3:3}, at: __unmap_hugepage_range_final+0x29/0x120 > [ 38.108827] > [ 38.108827] stack backtrace: > [ 38.109929] CPU: 0 PID: 1002 Comm: mlock Not tainted 6.6.0-rc2-next-20230921-dirty #4 > [ 38.111812] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014 > [ 38.113784] Call Trace: > [ 38.114456] <TASK> > [ 38.115066] dump_stack_lvl+0x57/0x90 > [ 38.116001] debug_check_no_locks_freed+0x137/0x170 > [ 38.117193] ? remove_vma+0x28/0x70 > [ 38.118088] __kmem_cache_free+0x8f/0x2b0 > [ 38.119080] remove_vma+0x28/0x70 > [ 38.119960] do_vmi_align_munmap+0x3b1/0x650 > [ 38.121051] do_vmi_munmap+0xc9/0x1a0 > [ 38.122006] __vm_munmap+0xa4/0x190 > [ 38.122931] __ia32_sys_munmap+0x15/0x20 > [ 38.123926] __do_fast_syscall_32+0x68/0x100 > [ 38.125031] do_fast_syscall_32+0x2f/0x70 > [ 38.126060] entry_SYSENTER_compat_after_hwframe+0x7b/0x8d > [ 38.127366] RIP: 0023:0xf7f05579 > [ 38.128198] Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00 > [ 38.132534] RSP: 002b:00000000fffa877c EFLAGS: 00000286 ORIG_RAX: 000000000000005b > [ 38.135703] RAX: ffffffffffffffda RBX: 00000000f7a00000 RCX: 0000000000200000 > [ 38.137323] RDX: 00000000f7a00000 RSI: 0000000000200000 RDI: 0000000000000003 > [ 38.138965] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000 > [ 38.140574] R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000000000 > [ 38.142191] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > [ 38.143865] </TASK> > > Something is not quite right. If you do not get to it first, I will take a > look as time permits. Just for grins I threw on patch 2 (with lock debugging) and ran the test suite. It gets past misaligned_offset, but is spewing locking warnings too fast to read. Something is certainly missing. -- Mike Kravetz