Re: [PATCH v1 16/16] arm64/mm: Defer barriers when updating kernel mappings

Ryan Roberts <ryan.roberts@xxxxxxx> · Mon, 17 Feb 2025 09:40:53 +0000

On 17/02/2025 04:48, Anshuman Khandual wrote:
> 
> 
> On 2/13/25 15:08, Ryan Roberts wrote:
>> On 13/02/2025 05:30, Anshuman Khandual wrote:
>>>
>>>
>>> On 2/10/25 16:42, Ryan Roberts wrote:
>>>> On 10/02/2025 08:03, Anshuman Khandual wrote:
>>>>>
>>>>>
>>>>> On 2/5/25 20:39, Ryan Roberts wrote:
>>>>>> Because the kernel can't tolerate page faults for kernel mappings, when
>>>>>> setting a valid, kernel space pte (or pmd/pud/p4d/pgd), it emits a
>>>>>> dsb(ishst) to ensure that the store to the pgtable is observed by the
>>>>>> table walker immediately. Additionally it emits an isb() to ensure that
>>>>>> any already speculatively determined invalid mapping fault gets
>>>>>> canceled.> 
>>>>>> We can improve the performance of vmalloc operations by batching these
>>>>>> barriers until the end of a set up entry updates. The newly added
>>>>>> arch_update_kernel_mappings_begin() / arch_update_kernel_mappings_end()
>>>>>> provide the required hooks.
>>>>>>
>>>>>> vmalloc improves by up to 30% as a result.
>>>>>>
>>>>>> Two new TIF_ flags are created; TIF_KMAP_UPDATE_ACTIVE tells us if we
>>>>>> are in the batch mode and can therefore defer any barriers until the end
>>>>>> of the batch. TIF_KMAP_UPDATE_PENDING tells us if barriers are queued to
>>>>>> be emited at the end of the batch.
>>>>>
>>>>> Why cannot this be achieved with a single TIF_KMAP_UPDATE_ACTIVE which is
>>>>> set in __begin(), cleared in __end() and saved across a __switch_to().
>>>>
>>>> So unconditionally emit the barriers in _end(), and emit them in __switch_to()
>>>> if TIF_KMAP_UPDATE_ACTIVE is set?
>>>
>>> Right.
>>>
>>>>
>>>> I guess if calling _begin() then you are definitely going to be setting at least
>>>> 1 PTE. So you can definitely emit the barriers unconditionally. I was trying to
>>>> protect against the case where you get pre-empted (potentially multiple times)
>>>> while in the loop. The TIF_KMAP_UPDATE_PENDING flag ensures you only emit the
>>>> barriers when you definitely need to. Without it, you would have to emit on
>>>> every pre-emption even if no more PTEs got set.
>>>>
>>>> But I suspect this is a premature optimization. Probably it will never occur. So
>>>
>>> Agreed.
>>
>> Having done this simplification, I've just noticed that one of the
>> arch_update_kernel_mappings_begin/end callsites is __apply_to_page_range() which
>> gets called for user space mappings as well as kernel mappings. So actually with
>> the simplification I'll be emitting barriers even when only user space mappings
>> were touched.
> 
> Right, that will not be desirable.
> 
>>
>> I think there are a couple of options to fix this:
>>
>> - Revert to the 2 flag approach. For the user space case, I'll get to _end() and
>> notice that no barriers are queued so will emit nothing.
>>
>> - Only set TIF_KMAP_UPDATE_ACTIVE if the address range passed to _begin() is a
>> kernel address range. I guess that's just a case of checking if the MSB is set
>> in "end"?
>>
>> - pass mm to _begin() and only set TIF_KMAP_UPDATE_ACTIVE if mm == &init_mm. I
>> guess this should be the same as option 2.
>>
>> I'm leaning towards option 2. But I have a niggling feeling that my proposed
>> check isn't quite correct. What do you think?
> 
> Option 2 and 3 looks better than the two flags approach proposed earlier. But is
> not option 3 bit more simplistic than option 2 ? Does getting struct mm argument
> into these function create more code churn ?

Actually looking at this again, I think the best thing is that when called in
the context of __apply_to_page_range(), we will only call
arch_update_kernel_mappings_[begin|end]() if mm == &init_mm. The function is
explicitly for "kernel mappings" so it doesn't make sense to call it for user
mappings.

Looking at the current implementations of arch_sync_kernel_mappings() they are
filtering on kernel addresses anyway, so this should be safe.

Thanks,
Ryan