On 05/22/23 at 10:21pm, Thomas Gleixner wrote: > On Mon, May 22 2023 at 22:34, Baoquan He wrote: > > On 05/22/23 at 02:02pm, Thomas Gleixner wrote: > >> > @@ -1736,6 +1737,14 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) > >> > list_replace_init(&purge_vmap_area_list, &local_purge_list); > >> > spin_unlock(&purge_vmap_area_lock); > >> > > >> > + vb = container_of(va, struct vmap_block, va); > >> > >> This cannot work vmap_area is not embedded in vmap_block. vmap_block::va > >> is a pointer. vmap_area does not link back to vmap_block, so there is no > >> way to find it based on a vmap_area. > > > > Oh, the code is buggy. va->flags can tell if it's vmap_block, then we > > can deduce the vb pointer. > > No. It _CANNOT_ work whether you check the flags or not. > > struct foo { > ..... > struct bar bar; > }; > > container_of(ptr_to_bar, struct foo, bar) returns the pointer to the > struct foo which has struct bar embedded. > > But > > struct foo { > ..... > struct bar *bar; > }; > > cannot do that because ptr_to_bar points to some object which is > completely disconnected from struct foo. > > Care to look at the implementation of container_of()? > > Here is what it boils down to: > > void *member_pointer = bar; > > p = (struct foo *)(member_pointer - offsetof(struct foo, bar); > > So it uses the pointer to bar and subtracts the offset of bar in struct > foo. This obviously can only work when struct bar is embedded in struct > foo. > > Lets assume that *bar is the first member of foo, i.e. offset of *bar in > struct foo is 0 > > p = (struct foo *)(member_pointer - 0); > > So you end up with > > p == member_pointer == bar > > But you won't get there because the static_assert() in container_of() > will catch that and the compiler will tell you in colourful ways. Thanks a lot, learn it now. I never noticed container_of() is not suitable for pointer member of struct case. > > Once the vmap area is handed over for cleaning up the vmap block is gone > and even if you let it stay around then the vmap area does not have any > information where to find the block. > > You'd need to have a pointer to the vmap block in vmap area or embed > vmap area into vmap block. Got it now. Embedding vmap_area into vmap_block seems not feasible because va need be reused when inserting into free_vmap_area_root/list. Adding a pointer to vmap_block looks do-able. Since vm_map_ram area doesn't have vm_struct associated with it, we can reuse the space of '->vm' to add vb pointer like below. Since in the existing code there are places where we use 'if(!va->vm)' to check if it's a normal va, we need be careful to find all of them out and replace with new and tighter checking. Will give a draft code change after all is done and testing is passed. diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index c720be70c8dd..e2ba6d59d679 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -15,6 +15,7 @@ struct vm_area_struct; /* vma defining user mapping in mm_types.h */ struct notifier_block; /* in notifier.h */ struct iov_iter; /* in uio.h */ +struct vmap_block; /* in mm/vmalloc.c */ /* bits in flags of vmalloc's vm_struct below */ #define VM_IOREMAP 0x00000001 /* ioremap() and friends */ @@ -76,6 +77,7 @@ struct vmap_area { union { unsigned long subtree_max_size; /* in "free" tree */ struct vm_struct *vm; /* in "busy" tree */ + struct vmap_block *vb; /* in "busy and purge" tree */ }; unsigned long flags; /* mark type of vm_map_ram area */ }; diff --git a/mm/vmalloc.c b/mm/vmalloc.c index c0f80982eb06..d97343271e27 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2061,6 +2061,10 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask) return ERR_PTR(err); } + spin_lock(&vmap_area_lock); + va->vb = vb; + spin_unlock(&vmap_area_lock); + vbq = raw_cpu_ptr(&vmap_block_queue); spin_lock(&vbq->lock); list_add_tail_rcu(&vb->free_list, &vbq->free);