On 17/02/2025 04:48, Anshuman Khandual wrote: > > > On 2/13/25 15:08, Ryan Roberts wrote: >> On 13/02/2025 05:30, Anshuman Khandual wrote: >>> >>> >>> On 2/10/25 16:42, Ryan Roberts wrote: >>>> On 10/02/2025 08:03, Anshuman Khandual wrote: >>>>> >>>>> >>>>> On 2/5/25 20:39, Ryan Roberts wrote: >>>>>> Because the kernel can't tolerate page faults for kernel mappings, when >>>>>> setting a valid, kernel space pte (or pmd/pud/p4d/pgd), it emits a >>>>>> dsb(ishst) to ensure that the store to the pgtable is observed by the >>>>>> table walker immediately. Additionally it emits an isb() to ensure that >>>>>> any already speculatively determined invalid mapping fault gets >>>>>> canceled.> >>>>>> We can improve the performance of vmalloc operations by batching these >>>>>> barriers until the end of a set up entry updates. The newly added >>>>>> arch_update_kernel_mappings_begin() / arch_update_kernel_mappings_end() >>>>>> provide the required hooks. >>>>>> >>>>>> vmalloc improves by up to 30% as a result. >>>>>> >>>>>> Two new TIF_ flags are created; TIF_KMAP_UPDATE_ACTIVE tells us if we >>>>>> are in the batch mode and can therefore defer any barriers until the end >>>>>> of the batch. TIF_KMAP_UPDATE_PENDING tells us if barriers are queued to >>>>>> be emited at the end of the batch. >>>>> >>>>> Why cannot this be achieved with a single TIF_KMAP_UPDATE_ACTIVE which is >>>>> set in __begin(), cleared in __end() and saved across a __switch_to(). >>>> >>>> So unconditionally emit the barriers in _end(), and emit them in __switch_to() >>>> if TIF_KMAP_UPDATE_ACTIVE is set? >>> >>> Right. >>> >>>> >>>> I guess if calling _begin() then you are definitely going to be setting at least >>>> 1 PTE. So you can definitely emit the barriers unconditionally. I was trying to >>>> protect against the case where you get pre-empted (potentially multiple times) >>>> while in the loop. The TIF_KMAP_UPDATE_PENDING flag ensures you only emit the >>>> barriers when you definitely need to. Without it, you would have to emit on >>>> every pre-emption even if no more PTEs got set. >>>> >>>> But I suspect this is a premature optimization. Probably it will never occur. So >>> >>> Agreed. >> >> Having done this simplification, I've just noticed that one of the >> arch_update_kernel_mappings_begin/end callsites is __apply_to_page_range() which >> gets called for user space mappings as well as kernel mappings. So actually with >> the simplification I'll be emitting barriers even when only user space mappings >> were touched. > > Right, that will not be desirable. > >> >> I think there are a couple of options to fix this: >> >> - Revert to the 2 flag approach. For the user space case, I'll get to _end() and >> notice that no barriers are queued so will emit nothing. >> >> - Only set TIF_KMAP_UPDATE_ACTIVE if the address range passed to _begin() is a >> kernel address range. I guess that's just a case of checking if the MSB is set >> in "end"? >> >> - pass mm to _begin() and only set TIF_KMAP_UPDATE_ACTIVE if mm == &init_mm. I >> guess this should be the same as option 2. >> >> I'm leaning towards option 2. But I have a niggling feeling that my proposed >> check isn't quite correct. What do you think? > > Option 2 and 3 looks better than the two flags approach proposed earlier. But is > not option 3 bit more simplistic than option 2 ? Does getting struct mm argument > into these function create more code churn ? Actually looking at this again, I think the best thing is that when called in the context of __apply_to_page_range(), we will only call arch_update_kernel_mappings_[begin|end]() if mm == &init_mm. The function is explicitly for "kernel mappings" so it doesn't make sense to call it for user mappings. Looking at the current implementations of arch_sync_kernel_mappings() they are filtering on kernel addresses anyway, so this should be safe. Thanks, Ryan