On 05/23/23 at 04:02pm, Thomas Gleixner wrote: > _vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a > page are left in the system. This is required due to the lazy TLB flush > mechanism in vmalloc. > > This is tried to achieve by walking the per CPU free lists, but those do > not contain fully utilized vmap blocks because they are removed from the > free list once the blocks free space became zero. The problem description is not accurate. This is tried to achieve for va associated with vmap_block by walking the per CPU free lists, those fully utilized vmap blocks can still be flushed in __purge_vmap_area_lazy() by calculating the [min:max] of purge_vmap_area_list, because va of vmap_blocks will be added to purge_vmap_area_list too via vb_free(). If we finally exclude the vmap_block va from purge list in __purge_vmap_area_lazy(), then we can say that stale TLBs is missed to flush. No? IMHO, this is a preparation work for the vmap_block va excluding from purge list flushing. Please correct me if I am wrong. > > So the per CPU list iteration does not find the block and if the page was > mapped via such a block and the TLB has not yet been flushed, the guarantee > of _vm_unmap_aliases() that there are no stale TLBs after returning is > broken: > > x = vb_alloc() // Removes vmap_block from free list because vb->free became 0 > vb_free(x) // Unmaps page and marks in dirty_min/max range > > // Page is reused > vm_unmap_aliases() // Can't find vmap block with the dirty space -> FAIL > > So instead of walking the per CPU free lists, walk the per CPU xarrays > which hold pointers to _all_ active blocks in the system including those > removed from the free lists. > > Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > --- > mm/vmalloc.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2236,9 +2236,10 @@ static void _vm_unmap_aliases(unsigned l > for_each_possible_cpu(cpu) { > struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu); > struct vmap_block *vb; > + unsigned long idx; > > rcu_read_lock(); > - list_for_each_entry_rcu(vb, &vbq->free, free_list) { > + xa_for_each(&vbq->vmap_blocks, idx, vb) { > spin_lock(&vb->lock); > if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) { > unsigned long va_start = vb->va->va_start; >