On Wed, May 17, 2023 at 01:58:44PM +0200, Thomas Gleixner wrote: > On Wed, May 17 2023 at 13:26, Uladzislau Rezki wrote: > > On Tue, May 16, 2023 at 07:04:34PM +0200, Thomas Gleixner wrote: > >> The proposed flush_tlb_kernel_vas(list, num_pages) mechanism > >> achieves: > >> > >> 1) It batches multiple ranges to _one_ invocation > >> > >> 2) It lets the architecture decide based on the number of pages > >> whether it does a tlb_flush_all() or a flush of individual ranges. > >> > >> Whether the architecture uses IPIs or flushes only locally and the > >> hardware propagates that is completely irrelevant. > >> > >> Right now any coalesced range, which is huge due to massive holes, takes > >> decision #2 away. > >> > >> If you want to flush individual VAs from the core vmalloc code then you > >> lose #1, as the aggregated number of pages might justify a tlb_flush_all(). > >> > >> That's a pure architecture decision and all the core code needs to do is > >> to provide appropriate information and not some completely bogus request > >> to flush 17312759359 pages, i.e. a ~64.5 TB range, while in reality > >> there are exactly _three_ distinct pages to flush. > >> > > 1. > > > > I think, all two cases(logic) should be moved into ARCH code, so a decision > > is made _not_ by vmalloc code how to flush, either fully, if it supported or > > page by page that require list chasing. > > Which is exactly what my patch does, no? > > > 2. > > void vfree(const void *addr) > > { > > ... > > if (unlikely(vm->flags & VM_FLUSH_RESET_PERMS)) > > vm_reset_perms(vm); <---- > > ... > > } > > > > so, all purged areas are drained in a caller context, so it is blocked > > until the drain is done including flushing. I am not sure why it is done > > from a caller context. > > > > IMHO, it should be deferred same way as we do in: > > > > static void free_vmap_area_noflush(struct vmap_area *va) > > How is that avoiding the problem? It just deferres it to some point in > the future. There is no guarantee that batching will be large enough to > justify a full flush. > > > if do not miss the point why vfree() has to do it directly. > > Keeping executable mappings around until some other flush happens is > obviously neither a brilliant idea nor correct. > It avoids of blocking a caller on vfree() by deferring the freeing into a workqueue context. At least i got the filling that "your task" that does vfree() blocks for unacceptable time. It can happen only if it performs VM_FLUSH_RESET_PERMS freeing(other freeing are deferred): <snip> if (unlikely(vm->flags & VM_FLUSH_RESET_PERMS)) vm_reset_perms(vm); <snip> in this case the vfree() can take some time instead of returning back to a user asap. Is that your issue? I am not talking that TLB flushing takes time, in this case holding on mutex also can take time. -- Uladzislau Rezki