Re: Excessive TLB flush ranges

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Wed, 17 May 2023 13:58:44 +0200

On Wed, May 17 2023 at 13:26, Uladzislau Rezki wrote:
> On Tue, May 16, 2023 at 07:04:34PM +0200, Thomas Gleixner wrote:
>> The proposed flush_tlb_kernel_vas(list, num_pages) mechanism
>> achieves:
>> 
>>   1) It batches multiple ranges to _one_ invocation
>> 
>>   2) It lets the architecture decide based on the number of pages
>>      whether it does a tlb_flush_all() or a flush of individual ranges.
>> 
>> Whether the architecture uses IPIs or flushes only locally and the
>> hardware propagates that is completely irrelevant.
>> 
>> Right now any coalesced range, which is huge due to massive holes, takes
>> decision #2 away.
>> 
>> If you want to flush individual VAs from the core vmalloc code then you
>> lose #1, as the aggregated number of pages might justify a tlb_flush_all().
>> 
>> That's a pure architecture decision and all the core code needs to do is
>> to provide appropriate information and not some completely bogus request
>> to flush 17312759359 pages, i.e. a ~64.5 TB range, while in reality
>> there are exactly _three_ distinct pages to flush.
>> 
> 1.
>
> I think, all two cases(logic) should be moved into ARCH code, so a decision 
> is made _not_ by vmalloc code how to flush, either fully, if it supported or
> page by page that require list chasing.

Which is exactly what my patch does, no?

> 2.
> void vfree(const void *addr)
> {
> ...
> 	if (unlikely(vm->flags & VM_FLUSH_RESET_PERMS))
> 		vm_reset_perms(vm); <----
> ...
> }
>
> so, all purged areas are drained in a caller context, so it is blocked
> until the drain is done including flushing. I am not sure why it is done
> from a caller context.
>
> IMHO, it should be deferred same way as we do in:
>
> static void free_vmap_area_noflush(struct vmap_area *va)

How is that avoiding the problem? It just deferres it to some point in
the future. There is no guarantee that batching will be large enough to
justify a full flush.

> if do not miss the point why vfree() has to do it directly.

Keeping executable mappings around until some other flush happens is
obviously neither a brilliant idea nor correct.

Thanks

        tglx