On Wed, Jan 15, 2025 at 7:10 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > On Tue, Jan 14, 2025 at 11:58 PM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > > > On 1/15/25 04:15, Suren Baghdasaryan wrote: > > > On Tue, Jan 14, 2025 at 6:27 PM Wei Yang <richard.weiyang@xxxxxxxxx> wrote: > > >> > > >> On Fri, Jan 10, 2025 at 08:26:03PM -0800, Suren Baghdasaryan wrote: > > >> > > >> >diff --git a/kernel/fork.c b/kernel/fork.c > > >> >index 9d9275783cf8..151b40627c14 100644 > > >> >--- a/kernel/fork.c > > >> >+++ b/kernel/fork.c > > >> >@@ -449,6 +449,42 @@ struct vm_area_struct *vm_area_alloc(struct mm_struct *mm) > > >> > return vma; > > >> > } > > >> > > > >> >+static void vm_area_init_from(const struct vm_area_struct *src, > > >> >+ struct vm_area_struct *dest) > > >> >+{ > > >> >+ dest->vm_mm = src->vm_mm; > > >> >+ dest->vm_ops = src->vm_ops; > > >> >+ dest->vm_start = src->vm_start; > > >> >+ dest->vm_end = src->vm_end; > > >> >+ dest->anon_vma = src->anon_vma; > > >> >+ dest->vm_pgoff = src->vm_pgoff; > > >> >+ dest->vm_file = src->vm_file; > > >> >+ dest->vm_private_data = src->vm_private_data; > > >> >+ vm_flags_init(dest, src->vm_flags); > > >> >+ memcpy(&dest->vm_page_prot, &src->vm_page_prot, > > >> >+ sizeof(dest->vm_page_prot)); > > >> >+ /* > > >> >+ * src->shared.rb may be modified concurrently when called from > > >> >+ * dup_mmap(), but the clone will reinitialize it. > > >> >+ */ > > >> >+ data_race(memcpy(&dest->shared, &src->shared, sizeof(dest->shared))); > > >> >+ memcpy(&dest->vm_userfaultfd_ctx, &src->vm_userfaultfd_ctx, > > >> >+ sizeof(dest->vm_userfaultfd_ctx)); > > >> >+#ifdef CONFIG_ANON_VMA_NAME > > >> >+ dest->anon_name = src->anon_name; > > >> >+#endif > > >> >+#ifdef CONFIG_SWAP > > >> >+ memcpy(&dest->swap_readahead_info, &src->swap_readahead_info, > > >> >+ sizeof(dest->swap_readahead_info)); > > >> >+#endif > > >> >+#ifndef CONFIG_MMU > > >> >+ dest->vm_region = src->vm_region; > > >> >+#endif > > >> >+#ifdef CONFIG_NUMA > > >> >+ dest->vm_policy = src->vm_policy; > > >> >+#endif > > >> >+} > > >> > > >> Would this be difficult to maintain? We should make sure not miss or overwrite > > >> anything. > > > > > > Yeah, it is less maintainable than a simple memcpy() but I did not > > > find a better alternative. > > > > Willy knows one but refuses to share it :( > > Ah, that reminds me why I dropped this approach :) But to be honest, > back then we also had vma_clear() and that added to the ugliness. Now > I could simply to this without all those macros: > > static inline void vma_copy(struct vm_area_struct *new, > struct vm_area_struct *orig) > { > /* Copy the vma while preserving vma->vm_lock */ > data_race(memcpy(new, orig, offsetof(struct vm_area_struct, vm_lock))); > data_race(memcpy(new + offsetofend(struct vm_area_struct, vm_lock), > orig + offsetofend(struct vm_area_struct, vm_lock), > sizeof(struct vm_area_struct) - > offsetofend(struct vm_area_struct, vm_lock)); > } > > Would that be better than the current approach? I discussed proposed alternatives with Willy and he prefers the current field-by-field copy approach. I also tried using kmsan_check_memory() to check for uninitialized memory in the vm_area_struct but unfortunately KMSAN stumbles on the holes in this structure and there are 4 of them (I attached pahole output at the end of this email). I tried unpoisoning holes but that gets very ugly very fast. So, I posted v10 (https://lore.kernel.org/all/20250213224655.1680278-18-surenb@xxxxxxxxxx/) without changing this part. struct vm_area_struct { union { struct { unsigned long vm_start; /* 0 8 */ unsigned long vm_end; /* 8 8 */ }; /* 0 16 */ freeptr_t vm_freeptr; /* 0 8 */ }; /* 0 16 */ union { struct { unsigned long vm_start; /* 0 8 */ unsigned long vm_end; /* 8 8 */ }; /* 0 16 */ freeptr_t vm_freeptr; /* 0 8 */ }; struct mm_struct * vm_mm; /* 16 8 */ pgprot_t vm_page_prot; /* 24 8 */ union { const vm_flags_t vm_flags; /* 32 8 */ vm_flags_t __vm_flags; /* 32 8 */ }; /* 32 8 */ union { const vm_flags_t vm_flags; /* 0 8 */ vm_flags_t __vm_flags; /* 0 8 */ }; unsigned int vm_lock_seq; /* 40 4 */ /* XXX 4 bytes hole, try to pack */ struct list_head anon_vma_chain; /* 48 16 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct anon_vma * anon_vma; /* 64 8 */ const struct vm_operations_struct * vm_ops; /* 72 8 */ unsigned long vm_pgoff; /* 80 8 */ struct file * vm_file; /* 88 8 */ void * vm_private_data; /* 96 8 */ atomic_long_t swap_readahead_info; /* 104 8 */ struct mempolicy * vm_policy; /* 112 8 */ /* XXX 8 bytes hole, try to pack */ /* --- cacheline 2 boundary (128 bytes) --- */ refcount_t vm_refcnt __attribute__((__aligned__(64))); /* 128 4 */ /* XXX 4 bytes hole, try to pack */ struct { struct rb_node rb __attribute__((__aligned__(8))); /* 136 24 */ unsigned long rb_subtree_last; /* 160 8 */ } shared; /* 136 32 */ struct { struct rb_node rb __attribute__((__aligned__(8))); /* 0 24 */ unsigned long rb_subtree_last; /* 24 8 */ /* size: 32, cachelines: 1, members: 2 */ /* forced alignments: 1 */ /* last cacheline: 32 bytes */ }; struct vm_userfaultfd_ctx vm_userfaultfd_ctx; /* 168 0 */ /* size: 192, cachelines: 3, members: 16 */ /* sum members: 152, holes: 3, sum holes: 16 */ /* padding: 24 */ /* forced alignments: 1, forced holes: 1, sum forced holes: 8 */ }; > > > > > > I added a warning above the struct > > > vm_area_struct definition to update this function every time we change > > > that structure. Not sure if there is anything else I can do to help > > > with this. > > > > > >> > > >> -- > > >> Wei Yang > > >> Help you, Help me > >