On 01/02/2021 14:08, Vlastimil Babka wrote: > On 1/8/21 3:39 PM, Milan Broz wrote: >> On 08/01/2021 14:41, Michal Hocko wrote: >>> On Wed 06-01-21 16:20:15, Milan Broz wrote: >>>> Hi, >>>> >>>> we use mlockall(MCL_CURRENT | MCL_FUTURE) / munlockall() in cryptsetup code >>>> and someone tried to use it with hardened memory allocator library. >>>> >>>> Execution time was increased to extreme (minutes) and as we found, the problem >>>> is in munlockall(). >>>> >>>> Here is a plain reproducer for the core without any external code - it takes >>>> unlocking on Fedora rawhide kernel more than 30 seconds! >>>> I can reproduce it on 5.10 kernels and Linus' git. >>>> >>>> The reproducer below tries to mmap large amount memory with PROT_NONE (later never used). >>>> The real code of course does something more useful but the problem is the same. >>>> >>>> #include <stdio.h> >>>> #include <stdlib.h> >>>> #include <fcntl.h> >>>> #include <sys/mman.h> >>>> >>>> int main (int argc, char *argv[]) >>>> { >>>> void *p = mmap(NULL, 1UL << 41, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); >>>> >>>> if (p == MAP_FAILED) return 1; >>>> >>>> if (mlockall(MCL_CURRENT | MCL_FUTURE)) return 1; >>>> printf("locked\n"); >>>> >>>> if (munlockall()) return 1; >>>> printf("unlocked\n"); >>>> >>>> return 0; >>>> } ... >> Today's Linus git - 5.11.0-rc2+ in my testing x86_64 VM (no extensive kernel debug options): >> >> # time ./lock >> locked >> unlocked >> >> real 0m4.172s >> user 0m0.000s >> sys 0m4.172s > > The perf report would be more interesting from this configuration. ok, I cannot run perf on that particular VM but tried the latest Fedora stable kernel without debug options - 5.10.12-200.fc33.x86_64 This is the report running reproducer above: time: real 0m6.123s user 0m0.099s sys 0m5.310s perf: # Total Lost Samples: 0 # # Samples: 20K of event 'cycles' # Event count (approx.): 20397603279 # # Overhead Command Shared Object Symbol # ........ ....... ................. ............................ # 47.26% lock [kernel.kallsyms] [k] follow_page_mask 20.43% lock [kernel.kallsyms] [k] munlock_vma_pages_range 15.92% lock [kernel.kallsyms] [k] follow_page 7.40% lock [kernel.kallsyms] [k] rcu_all_qs 5.87% lock [kernel.kallsyms] [k] _cond_resched 3.08% lock [kernel.kallsyms] [k] follow_huge_addr 0.01% lock [kernel.kallsyms] [k] __update_load_avg_cfs_rq 0.01% lock [kernel.kallsyms] [k] ____fput 0.01% lock [kernel.kallsyms] [k] rmap_walk_file 0.00% lock [kernel.kallsyms] [k] page_mapped 0.00% lock [kernel.kallsyms] [k] native_irq_return_iret 0.00% lock [kernel.kallsyms] [k] _raw_spin_lock_irq 0.00% lock [kernel.kallsyms] [k] perf_iterate_ctx 0.00% lock [kernel.kallsyms] [k] finish_task_switch 0.00% perf [kernel.kallsyms] [k] native_sched_clock 0.00% lock [kernel.kallsyms] [k] native_write_msr 0.00% perf [kernel.kallsyms] [k] native_write_msr m.