On Thu, Oct 17, 2024 at 12:42 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > On Thu, Oct 17, 2024 at 11:55 AM Andrii Nakryiko > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > On Wed, Oct 16, 2024 at 7:02 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > > > > > On Sun, Oct 13, 2024 at 12:56 AM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote: > > > > > > > > On Thu, Oct 10, 2024 at 01:56:42PM GMT, Andrii Nakryiko wrote: > > > > > To increase mm->mm_lock_seq robustness, switch it from int to long, so > > > > > that it's a 64-bit counter on 64-bit systems and we can stop worrying > > > > > about it wrapping around in just ~4 billion iterations. Same goes for > > > > > VMA's matching vm_lock_seq, which is derived from mm_lock_seq. > > > > > > vm_lock_seq does not need to be long but for consistency I guess that > > > > How come, we literally assign vm_lock_seq from mm_lock_seq and do > > direct comparisons. They have to be exactly the same type, no? > > Not necessarily. vm_lock_seq is a snapshot of the mm_lock_seq but it > does not have to be a "complete" snapshot. Just something that has a > very high probability of identifying a match and a rare false positive > is not a problem (see comment in > https://elixir.bootlin.com/linux/v6.11.3/source/include/linux/mm.h#L678). > So, something like this for taking and comparing a snapshot would do: > > vma->vm_lock_seq = (unsigned int)mm->mm_lock_seq; > if (vma->vm_lock_seq == (unsigned int)mm->mm_lock_seq) Ah, ok, I see what's the idea behind it, makes sense. > > > > > > makes sense. While at it, can you please change these seq counters to > > > be unsigned? > > > > There is `vma->vm_lock_seq = -1;` in kernel/fork.c, should it be > > switched to ULONG_MAX then? In general, unless this is critical for > > correctness, I'd very much like stuff like this to be done in the mm > > tree afterwards, but it seems trivial enough, so if you insist I'll do > > it. > > Yeah, ULONG_MAX should work fine here. vma->vm_lock_seq is initialized > to -1 to avoid false initial match with mm->mm_lock_seq which is > initialized to 0. As I said, a false match is not a problem but if we > can avoid it, that's better. ok, ULONG_MAX and unsigned long it is, will update > > > > > > Also, did you check with pahole if the vm_area_struct layout change > > > pushes some members into a difference cacheline or creates new gaps? > > > > > > > Just did. We had 3 byte hole after `bool detached;`, it now grew to 7 > > bytes (so +4) and then vm_lock_seq itself is now 8 bytes (so +4), > > which now does push rb and rb_subtree_last into *THE SAME* cache line > > (which sounds like an improvement to me). vm_lock_seq and vm_lock stay > > in the same cache line. vm_pgoff and vm_file are now in the same cache > > line, and given they are probably always accessed together, seems like > > a good accidental change as well. See below pahole outputs before and > > after. > > Ok, sounds good to me. Looks like keeping both sequence numbers 64bit > is not an issue. Changing them to unsigned would be nice and trivial > but I don't insist. You can add: > > Reviewed-by: Suren Baghdasaryan <surenb@xxxxxxxxxx> thanks, will switch to unsigned in the next revision (next week, probably, to let some of the pending patches land) > > > > > That singular detached bool looks like a complete waste, tbh. Maybe it > > would be better to roll it into vm_flags and save 8 bytes? (not that I > > want to do those mm changes in this patch set, of course...). > > vm_area_struct is otherwise nicely tightly packed. > > > > tl;dr, seems fine, and detached would be best to get rid of, if > > possible (but that's a completely separate thing) > > Yeah, I'll take a look at that. Thanks! > > > > > BEFORE > > ====== > > struct vm_area_struct { > > union { > > struct { > > long unsigned int vm_start; /* 0 8 */ > > long unsigned int vm_end; /* 8 8 */ > > }; /* 0 16 */ > > struct callback_head vm_rcu; /* 0 16 */ > > } __attribute__((__aligned__(8))); /* 0 16 */ > > struct mm_struct * vm_mm; /* 16 8 */ > > pgprot_t vm_page_prot; /* 24 8 */ > > union { > > const vm_flags_t vm_flags; /* 32 8 */ > > vm_flags_t __vm_flags; /* 32 8 */ > > }; /* 32 8 */ > > bool detached; /* 40 1 */ > > > > /* XXX 3 bytes hole, try to pack */ > > > > int vm_lock_seq; /* 44 4 */ > > struct vma_lock * vm_lock; /* 48 8 */ > > struct { > > struct rb_node rb; /* 56 24 */ > > /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */ > > long unsigned int rb_subtree_last; /* 80 8 */ > > } /* 56 32 */ > > struct list_head anon_vma_chain; /* 88 16 */ > > struct anon_vma * anon_vma; /* 104 8 */ > > const struct vm_operations_struct * vm_ops; /* 112 8 */ > > long unsigned int vm_pgoff; /* 120 8 */ > > /* --- cacheline 2 boundary (128 bytes) --- */ > > struct file * vm_file; /* 128 8 */ > > void * vm_private_data; /* 136 8 */ > > atomic_long_t swap_readahead_info; /* 144 8 */ > > struct mempolicy * vm_policy; /* 152 8 */ > > struct vma_numab_state * numab_state; /* 160 8 */ > > struct vm_userfaultfd_ctx vm_userfaultfd_ctx; /* 168 8 */ > > > > /* size: 176, cachelines: 3, members: 18 */ > > /* sum members: 173, holes: 1, sum holes: 3 */ > > /* forced alignments: 2 */ > > /* last cacheline: 48 bytes */ > > } __attribute__((__aligned__(8))); > > > > AFTER > > ===== > > struct vm_area_struct { > > union { > > struct { > > long unsigned int vm_start; /* 0 8 */ > > long unsigned int vm_end; /* 8 8 */ > > }; /* 0 16 */ > > struct callback_head vm_rcu; /* 0 16 */ > > } __attribute__((__aligned__(8))); /* 0 16 */ > > struct mm_struct * vm_mm; /* 16 8 */ > > pgprot_t vm_page_prot; /* 24 8 */ > > union { > > const vm_flags_t vm_flags; /* 32 8 */ > > vm_flags_t __vm_flags; /* 32 8 */ > > }; /* 32 8 */ > > bool detached; /* 40 1 */ > > > > /* XXX 7 bytes hole, try to pack */ > > > > long int vm_lock_seq; /* 48 8 */ > > struct vma_lock * vm_lock; /* 56 8 */ > > /* --- cacheline 1 boundary (64 bytes) --- */ > > struct { > > struct rb_node rb; /* 64 24 */ > > long unsigned int rb_subtree_last; /* 88 8 */ > > } /* 64 32 */ > > struct list_head anon_vma_chain; /* 96 16 */ > > struct anon_vma * anon_vma; /* 112 8 */ > > const struct vm_operations_struct * vm_ops; /* 120 8 */ > > /* --- cacheline 2 boundary (128 bytes) --- */ > > long unsigned int vm_pgoff; /* 128 8 */ > > struct file * vm_file; /* 136 8 */ > > void * vm_private_data; /* 144 8 */ > > atomic_long_t swap_readahead_info; /* 152 8 */ > > struct mempolicy * vm_policy; /* 160 8 */ > > struct vma_numab_state * numab_state; /* 168 8 */ > > struct vm_userfaultfd_ctx vm_userfaultfd_ctx; /* 176 8 */ > > > > /* size: 184, cachelines: 3, members: 18 */ > > /* sum members: 177, holes: 1, sum holes: 7 */ > > /* forced alignments: 2 */ > > /* last cacheline: 56 bytes */ > > } __attribute__((__aligned__(8))); > > > > > > > > > > > > > > I didn't use __u64 outright to keep 32-bit architectures unaffected, but > > > > > if it seems important enough, I have nothing against using __u64. > > > > > > > > > > Suggested-by: Jann Horn <jannh@xxxxxxxxxx> > > > > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > > > > > > > > Reviewed-by: Shakeel Butt <shakeel.butt@xxxxxxxxx> > > >