On Wed, May 17 2023 at 13:26, Uladzislau Rezki wrote: > On Tue, May 16, 2023 at 07:04:34PM +0200, Thomas Gleixner wrote: >> The proposed flush_tlb_kernel_vas(list, num_pages) mechanism >> achieves: >> >> 1) It batches multiple ranges to _one_ invocation >> >> 2) It lets the architecture decide based on the number of pages >> whether it does a tlb_flush_all() or a flush of individual ranges. >> >> Whether the architecture uses IPIs or flushes only locally and the >> hardware propagates that is completely irrelevant. >> >> Right now any coalesced range, which is huge due to massive holes, takes >> decision #2 away. >> >> If you want to flush individual VAs from the core vmalloc code then you >> lose #1, as the aggregated number of pages might justify a tlb_flush_all(). >> >> That's a pure architecture decision and all the core code needs to do is >> to provide appropriate information and not some completely bogus request >> to flush 17312759359 pages, i.e. a ~64.5 TB range, while in reality >> there are exactly _three_ distinct pages to flush. >> > 1. > > I think, all two cases(logic) should be moved into ARCH code, so a decision > is made _not_ by vmalloc code how to flush, either fully, if it supported or > page by page that require list chasing. Which is exactly what my patch does, no? > 2. > void vfree(const void *addr) > { > ... > if (unlikely(vm->flags & VM_FLUSH_RESET_PERMS)) > vm_reset_perms(vm); <---- > ... > } > > so, all purged areas are drained in a caller context, so it is blocked > until the drain is done including flushing. I am not sure why it is done > from a caller context. > > IMHO, it should be deferred same way as we do in: > > static void free_vmap_area_noflush(struct vmap_area *va) How is that avoiding the problem? It just deferres it to some point in the future. There is no guarantee that batching will be large enough to justify a full flush. > if do not miss the point why vfree() has to do it directly. Keeping executable mappings around until some other flush happens is obviously neither a brilliant idea nor correct. Thanks tglx