On Mon, Aug 21, 2023 at 10:28:29PM +0200, Mateusz Guzik wrote: > While with the patch these allocations remain a significant problem, > the primary bottleneck shifts to: > > __pv_queued_spin_lock_slowpath+1 > _raw_spin_lock_irqsave+57 > folio_lruvec_lock_irqsave+91 > release_pages+590 > tlb_batch_pages_flush+61 > tlb_finish_mmu+101 > exit_mmap+327 > __mmput+61 > begin_new_exec+1245 > load_elf_binary+712 > bprm_execve+644 > do_execveat_common.isra.0+429 > __x64_sys_execve+50 Looking at this more closely, I don't think the patches I sent are going to help much. I'd say the primary problem you have is that you're trying to free _a lot_ of pages at once on all CPUs. Since it's the exit_mmap() path, these are going to be the anonymous pages allocated to this task (not the file pages it has mmaped). The large anonymous folios work may help you out here by decreasing the number of folios we have to manage, and thus the length of time the LRU lock has to be held for. It's not an immediate solution, but I think it'll do the job once it lands.