On 05/16/23 at 10:54am, Thomas Gleixner wrote: > On Tue, May 16 2023 at 16:07, Baoquan He wrote: > > On 05/16/23 at 08:40am, Thomas Gleixner wrote: > >> On Tue, May 16 2023 at 10:26, Baoquan He wrote: > >> > On 05/15/23 at 08:17pm, Uladzislau Rezki wrote: > >> >> For systems which lack a full TLB flush and to flush a long range is > >> >> a problem(it takes time), probably we can flush VA one by one. Because > >> >> currently we calculate a flush range [min:max] and that range includes > >> >> the space that might not be mapped at all. Like below: > >> > > >> > It's fine if we only calculate a flush range of [min:max] with VA. In > >> > vm_reset_perms(), it calculates the flush range with the impacted direct > >> > mapping range, then merge it with VA's range. That looks really strange > >> > and surprising. If the vm->pages[] are got from a lower part of physical > >> > memory, the final merged flush will span tremendous range. Wondering why > >> > we need merge the direct map range with VA range, then do flush. Not > >> > sure if I misunderstand it. > >> > >> So what happens on this BPF teardown is: > >> > >> The vfree(8k) ends up flushing 3 entries. The actual vmalloc part (2) and > >> one extra which is in the direct map. I haven't verified that yet, but I > >> assume it's the alias of one of the vmalloc'ed pages. > > > > It looks like the reason. As Uladzislau pointed out, ARCH-es may > > have full TLB flush, so won't get trouble from the merged flush > > in the calculated [min:max] way, e.g arm64 and x86's flush_tlb_kernel_range(). > > However, arm32 seems lacking the ability of full TLB flash. > > ARM has a full flush, but it does not check for that in > flush_tlb_kernel_range(). > > > If agreed, I can make a draft patch to do the flush for direct map and > > VA seperately, see if it works. > > Of course it works. Already done that. > > But you are missing the point. Look at the examples I provided. > > The current implementation ends up doing a full flush on x86 just to > flush 3 TLB entries. For the very same reason because the flush range > (start..end) becomes insanely large due to the direct map and vmalloc > parts. > > But doing indivudual flushes for direct map and vmalloc space is silly > too because then it ends up doing two IPIs instead of one. IPIs are > expensive and the whole point of coalescing the flushes is to spare > IPIs, no? > > So with my hacked up flush_tlb_kernel_vas() I end up having exactly > _one_ IPI which walks the list and flushes the 3 TLB entries. Makes sense, thanks for telling. While your handling about alias_va may not be right. I will add inline comment in your patch, please check there.