On Thu, Dec 02, 2021, David Matlack wrote: > On Thu, Dec 2, 2021 at 10:43 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > Because they're two different things. Lock contention is already handled by > > tdp_mmu_iter_cond_resched(). If mmu_lock is not contended, holding it for a long > > duration is a complete non-issue. > > So I think you are positing that disabling reclaim will make the > allocations fast enough that the time between > tdp_mmu_iter_cond_resched checks will be acceptable. Yep. > Is there really no risk of long tail latency in kmem_cache_alloc() or > __get_free_page()? Even if it's rare, they will be common at scale. If there is a potentially long latency in __get_free_page(), then we're hosed no matter what because per alloc_pages(), it's allowed in any context, including NMI, IRQ, and Soft-IRQ. I've no idea how often those contexts allocate, but I assume it's not _that_ rare given the amount of stuff that networking does in Soft-IRQ context, e.g. see the stack trace from commit 2620fe268e80, the use of PF_MEMALLOC, the use of GFP_ATOMIC in napi_alloc_skb, etc... Anb it's not just direct allocations, e.g. anything that uses a radix tree or XArray will potentially trigger allocation on insertion. But I would be very, very surprised if alloc_pages() without GFP_DIRECT_RECLAIM has a long tail latency, otherwise allocating from any atomic context would be doomed. > This is why I'm being so hesitant, and prefer to avoid the problem > entirely by doing all allocations outside the lock. But I'm honestly > more than happy to be convinced otherwise and go with your approach.