On 3/6/25 11:34, Lorenzo Stoakes wrote: > We invoke ksm_madvise() with an intentionally dummy flags field, so no > need to pass around. > > Additionally, the code tries to be 'clever' with account_start, > account_end, using these to both check that vma->vm_start != 0 and that we > ought to account the newly split portion of VMA post-move, either before > or after it. > > We need to do this because we intentionally removed VM_ACCOUNT on the VMA > prior to unmapping, so we don't erroneously unaccount memory (we have > already calculated the correct amount to account and accounted it, any > subsequent subtraction will be incorrect). > > This patch significantly expands the comment (from 2002!) about > 'concealing' the flag to make it abundantly clear what's going on, as well > as adding and expanding a number of other comments also. > > We can remove account_start, account_end by instead tracking when we > account (i.e. vma->vm_flags has the VM_ACCOUNT flag set, and this is not > an MREMAP_DONTUNMAP operation), and figuring out when to reinstate the > VM_ACCOUNT flag on prior/subsequent VMAs separately. > > We additionally break the function into logical pieces and attack the very > confusing error handling logic (where, for instance, new_addr is set to > err). > > After this change the code is considerably more readable and easy to > manipulate. > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> Reviewed-by: Vlastimil Babka <vbabka@xxxxxxx> Some nits below: > +/* > + * Unmap source VMA for VMA move, turning it from a copy to a move, being > + * careful to ensure we do not underflow memory account while doing so if an > + * accountable move. > + * > + * This is best effort, if we fail to unmap then we simply try this looks like an unfinished sentence? > @@ -1007,51 +1157,15 @@ static unsigned long move_vma(struct vma_remap_struct *vrm) > */ > hiwater_vm = mm->hiwater_vm; This... > - /* Tell pfnmap has moved from this vma */ > - if (unlikely(vma->vm_flags & VM_PFNMAP)) > - untrack_pfn_clear(vma); > - > - if (unlikely(!err && (vrm->flags & MREMAP_DONTUNMAP))) { > - /* We always clear VM_LOCKED[ONFAULT] on the old vma */ > - vm_flags_clear(vma, VM_LOCKED_MASK); > - > - /* > - * anon_vma links of the old vma is no longer needed after its page > - * table has been moved. > - */ > - if (new_vma != vma && vma->vm_start == old_addr && > - vma->vm_end == (old_addr + old_len)) > - unlink_anon_vmas(vma); > - > - /* Because we won't unmap we don't need to touch locked_vm */ > - vrm_stat_account(vrm, new_len); > - return new_addr; > - } > - > - vrm_stat_account(vrm, new_len); > - > - vma_iter_init(&vmi, mm, old_addr); > - if (do_vmi_munmap(&vmi, mm, old_addr, old_len, vrm->uf_unmap, false) < 0) { > - /* OOM: unable to split vma, just get accounts right */ > - if (vm_flags & VM_ACCOUNT && !(vrm->flags & MREMAP_DONTUNMAP)) > - vm_acct_memory(old_len >> PAGE_SHIFT); > - account_start = account_end = false; > - } > + vrm_stat_account(vrm, vrm->new_len); > + if (unlikely(!err && (vrm->flags & MREMAP_DONTUNMAP))) > + dontunmap_complete(vrm, new_vma); > + else > + unmap_source_vma(vrm); > > mm->hiwater_vm = hiwater_vm; ... and this AFAICS only applies to the unmap_source_vma() case. And from the comment it seems only to the "undo destination vma" variant. BTW this also means the unmap_source_vma() name is rather misleading? So I think the whole handling of that hiwater_vm could move to unmap_source_vma(). It should not matter if we take the snapshot of hiwater_vm only after vrm_stat_account() bumps the total_vm. Nobody else can be racing with us to permanently turn that total_vm to a hiwater_vm. (this is probably a potential cleanup for a followup-patch anyway) > > - /* Restore VM_ACCOUNT if one or two pieces of vma left */ > - if (account_start) { > - vma = vma_prev(&vmi); > - vm_flags_set(vma, VM_ACCOUNT); > - } > - > - if (account_end) { > - vma = vma_next(&vmi); > - vm_flags_set(vma, VM_ACCOUNT); > - } > - > - return new_addr; > + return err ? (unsigned long)err : vrm->new_addr; > } > > /* > -- > 2.48.1