On Mon, Mar 10, 2025 at 04:11:59PM +0100, Vlastimil Babka wrote: > On 3/6/25 11:34, Lorenzo Stoakes wrote: > > We invoke ksm_madvise() with an intentionally dummy flags field, so no > > need to pass around. > > > > Additionally, the code tries to be 'clever' with account_start, > > account_end, using these to both check that vma->vm_start != 0 and that we > > ought to account the newly split portion of VMA post-move, either before > > or after it. > > > > We need to do this because we intentionally removed VM_ACCOUNT on the VMA > > prior to unmapping, so we don't erroneously unaccount memory (we have > > already calculated the correct amount to account and accounted it, any > > subsequent subtraction will be incorrect). > > > > This patch significantly expands the comment (from 2002!) about > > 'concealing' the flag to make it abundantly clear what's going on, as well > > as adding and expanding a number of other comments also. > > > > We can remove account_start, account_end by instead tracking when we > > account (i.e. vma->vm_flags has the VM_ACCOUNT flag set, and this is not > > an MREMAP_DONTUNMAP operation), and figuring out when to reinstate the > > VM_ACCOUNT flag on prior/subsequent VMAs separately. > > > > We additionally break the function into logical pieces and attack the very > > confusing error handling logic (where, for instance, new_addr is set to > > err). > > > > After this change the code is considerably more readable and easy to > > manipulate. > > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> > > Reviewed-by: Vlastimil Babka <vbabka@xxxxxxx> Thanks! > > Some nits below: > > > +/* > > + * Unmap source VMA for VMA move, turning it from a copy to a move, being > > + * careful to ensure we do not underflow memory account while doing so if an > > + * accountable move. > > + * > > + * This is best effort, if we fail to unmap then we simply try > > this looks like an unfinished sentence? Damn yeah, will fix. Will wait for rest of your review before either fix-patching or respinning. > > > @@ -1007,51 +1157,15 @@ static unsigned long move_vma(struct vma_remap_struct *vrm) > > */ > > hiwater_vm = mm->hiwater_vm; > > This... > > > - /* Tell pfnmap has moved from this vma */ > > - if (unlikely(vma->vm_flags & VM_PFNMAP)) > > - untrack_pfn_clear(vma); > > - > > - if (unlikely(!err && (vrm->flags & MREMAP_DONTUNMAP))) { > > - /* We always clear VM_LOCKED[ONFAULT] on the old vma */ > > - vm_flags_clear(vma, VM_LOCKED_MASK); > > - > > - /* > > - * anon_vma links of the old vma is no longer needed after its page > > - * table has been moved. > > - */ > > - if (new_vma != vma && vma->vm_start == old_addr && > > - vma->vm_end == (old_addr + old_len)) > > - unlink_anon_vmas(vma); > > - > > - /* Because we won't unmap we don't need to touch locked_vm */ > > - vrm_stat_account(vrm, new_len); > > - return new_addr; > > - } > > - > > - vrm_stat_account(vrm, new_len); > > - > > - vma_iter_init(&vmi, mm, old_addr); > > - if (do_vmi_munmap(&vmi, mm, old_addr, old_len, vrm->uf_unmap, false) < 0) { > > - /* OOM: unable to split vma, just get accounts right */ > > - if (vm_flags & VM_ACCOUNT && !(vrm->flags & MREMAP_DONTUNMAP)) > > - vm_acct_memory(old_len >> PAGE_SHIFT); > > - account_start = account_end = false; > > - } > > + vrm_stat_account(vrm, vrm->new_len); > > + if (unlikely(!err && (vrm->flags & MREMAP_DONTUNMAP))) > > + dontunmap_complete(vrm, new_vma); > > + else > > + unmap_source_vma(vrm); > > > > mm->hiwater_vm = hiwater_vm; > > ... and this AFAICS only applies to the unmap_source_vma() case. And from > the comment it seems only to the "undo destination vma" variant. BTW this > also means the unmap_source_vma() name is rather misleading? I agree it's only meaningful in that case, I am not sure what you mean about the undo destination vma case? We have 3 situations in which we call unmap_source_vma(): 1. normal, no error move. 2. error has occured, so we set the source to destination and vice-versa. 3. MREMAP_DONTUNMAP, error has occured, so we set the source to destination and vice-versa. I _hate_ this error handling where we reverse the source/destination, it's so confusing, but since we are doing that, I still think unmap_source_vma() is meaningful. unmap_old_vma() is possible too, but I think less clear... The beter fix I think is to find a less awful way to do the error handling. > > So I think the whole handling of that hiwater_vm could move to > unmap_source_vma(). It should not matter if we take the snapshot of > hiwater_vm only after vrm_stat_account() bumps the total_vm. Nobody else can > be racing with us to permanently turn that total_vm to a hiwater_vm. > > (this is probably a potential cleanup for a followup-patch anyway) Yup agreed, but let's do it as a follow up as this is how it is in the original code. Will add to the whiteboard TODO... > > > > > - /* Restore VM_ACCOUNT if one or two pieces of vma left */ > > - if (account_start) { > > - vma = vma_prev(&vmi); > > - vm_flags_set(vma, VM_ACCOUNT); > > - } > > - > > - if (account_end) { > > - vma = vma_next(&vmi); > > - vm_flags_set(vma, VM_ACCOUNT); > > - } > > - > > - return new_addr; > > + return err ? (unsigned long)err : vrm->new_addr; > > } > > > > /* > > -- > > 2.48.1 >