On Wed, Apr 19, 2023 at 01:30:57PM +0200, David Hildenbrand wrote: > On 06.04.23 20:27, Peter Zijlstra wrote: > > On Thu, Apr 06, 2023 at 05:51:52PM +0200, David Hildenbrand wrote: > > > On 06.04.23 17:02, Peter Zijlstra wrote: > > > > > > DavidH, what do you thikn about reviving Jann's patches here: > > > > > > > > https://bugs.chromium.org/p/project-zero/issues/detail?id=2365#c1 > > > > > > > > Those are far more invasive, but afaict they seem to do the right thing. > > > > > > > > > > I recall seeing those while discussed on security@xxxxxxxxxx. What we > > > currently have was (IMHO for good reasons) deemed better to fix the issue, > > > especially when caring about backports and getting it right. > > > > Yes, and I think that was the right call. However, we can now revisit > > without having the pressure of a known defect and backport > > considerations. > > > > > The alternative that was discussed in that context IIRC was to simply > > > allocate a fresh page table, place the fresh page table into the list > > > instead, and simply free the old page table (then using common machinery). > > > > > > TBH, I'd wish (and recently raised) that we could just stop wasting memory > > > on page tables for THPs that are maybe never going to get PTE-mapped ... and > > > eventually just allocate on demand (with some caching?) and handle the > > > places where we're OOM and cannot PTE-map a THP in some descend way. > > > > > > ... instead of trying to figure out how to deal with these page tables we > > > cannot free but have to special-case simply because of GUP-fast. > > > > Not keeping them around sounds good to me, but I'm not *that* familiar > > with the THP code, most of that happened after I stopped tracking mm. So > > I'm not sure how feasible is it. > > > > But it does look entirely feasible to rework this page-table freeing > > along the lines Jann did. > > It's most probably more feasible, although the easiest would be to just > allocate a fresh page table to deposit and free the old one using the mmu > gatherer. > > This way we can avoid the khugepaged of tlb_remove_table_smp_sync(), but not > the tlb_remove_table_one() usage. I suspect khugepaged isn't really relevant > in RT kernels (IIRC, most of RT setups disable THP completely). People will disable khugepaged because it causes IPIs (and the fact one has to disable khugepaged is a configuration overhead, and a source of headache for configuring the realtime system, since one can forget of doing that, etc). But people do want to run non-RT applications along with RT applications (in case you have a single box on a priviledged location, for example). > > tlb_remove_table_one() only triggers if __get_free_page(GFP_NOWAIT | > __GFP_NOWARN); fails. IIUC, that can happen easily under memory pressure > because it doesn't wait for direct reclaim. > > I don't know much about RT workloads (so I'd appreciate some feedback), but > I guess we can run int memory pressure as well due to some !rt housekeeping > task on the system? Yes, exactly (memory for -RT app will be mlocked).