On Sun, Aug 4, 2024 at 1:33 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Sun, 4 Aug 2024 at 01:59, kernel test robot <oliver.sang@xxxxxxxxx> wrote: > > > > kernel test robot noticed a -4.4% regression of stress-ng.pagemove.page_remaps_per_sec on > > commit 8be7258aad44 ("mseal: add mseal syscall") > > Ok, it's basically just the vma walk in can_modify_mm(): > > > 1.06 +0.1 1.18 perf-profile.self.cycles-pp.mas_next_slot > > 1.50 +0.5 1.97 perf-profile.self.cycles-pp.mas_find > > 0.00 +1.4 1.35 perf-profile.self.cycles-pp.can_modify_mm > > 3.13 +2.0 5.13 perf-profile.self.cycles-pp.mas_walk > > and looks like it's two different pathways. We have __do_sys_mremap -> > mremap_to -> do_munmap -> do_vmi_munmap -> can_modify_mm for the > destination mapping, but we also have mremap_to() calling > can_modify_mm() directly for the source mapping. > There are two scenarios in mremap syscall. 1> mremap_to (relocate vma) 2> shrink/expand. Those two scenarios are handled by different code path: For case 1> mremap_to (relocate vma) -> can_modify_mm , check src for sealing. -> if MREMAP_FIXED ->-> do_munmap (dst) // free dst ->->-> do_vmi_munmap (dst) ->->->-> can_modify_mm (dst) // check dst for sealing -> if dst size is smaller (shrink case) ->-> do_munmap(dst, to remove extra size) ->->-> do_vmi_munmap ->->->-> can_modify_mm(dst) (potentially duplicate with check for MREMAP_FIXED, practically, the memory should be unmapped, so the cost looking for a un-existed memory range in the maple tree ) For case 2> Shrink/Expand. -> can_modify_mm, check addr is sealed -> if dst size is smaller (shrink case) ->-> do_vmi_munmap(remove_extra_size) -> ->-> can_modify_mm(addr) (This is redundant because addr is already checked) For case 2:, potentially we can improve it by passing a flag into do_vmi_munmap() to indicate the sealing is already checked by the caller. (however, this idea have to be tested to show actual gain) The reported regression is in mremap, I wonder why mprotect/munmap doesn't have similar impact, since they use the same pattern (one extra out-of-place check for memory range) During version 9, I tested munmap/mprotect/madvise for perf [1] . The test shows mseal adds 20-40 ns or 50-100 CPU cycle pre call, this is much smaller (one tenth) than change from 5.10 to 6.8. The test is using multiple VMAs with various types[2]. The next step for me is to run the stress-ng.pagemove.page_remaps_per_sec to understand why mremap shows a big regression number. [1] https://lore.kernel.org/all/20240214151130.616240-1-jeffxu@xxxxxxxxxxxx/ [2] https://github.com/peaktocreek/mmperf Best regards, -Jeff > And then do_vmi_munmap() will do it's *own* vma_find() after having > done arch_unmap(). > > And do_munmap() will obviously do its own vma lookup as part of > calling vma_to_resize(). > > So it looks like a large portion of this regression is because the > mseal addition just ends up walking the vma list way too much. > > Linus