* Yajun Deng <yajun.deng@xxxxxxxxx> [240220 22:26]: > > On 2024/2/21 02:06, Liam R. Howlett wrote: > > * Yajun Deng <yajun.deng@xxxxxxxxx> [240218 21:30]: > > > Cc: Vlastimil, Lorenzo,Suren > > > > > > On 2024/2/18 10:31, Yajun Deng wrote: > > > > There are two types of iterators mas and vmi in the current code. If the > > > > maple tree comes from the mm struct, we can use vma iterator. Avoid using > > > > mas directly. > > Thanks for looking at this. > > > > I had left the maple state exposed in the mmap.c file because it does a > > number of operations that no one else does, so the new functions will be > > called a very limited number of times (as low as once). > > > > I think this is a worth while change since this may be needed in the > > future for dealing with more specialised uses of the tree. It also > > removes the 0/ULONG_MAX limits from certain calls, and the vma iterator > > names help explain things. > > > > I don't know why you treat the low/high search differently than the > > mas_reset() and mas_*_range(). In any case, the comment is inaccurate > > when mas_ functions are called with &vmi.mas in places. > > > > > > Because the mas_reset() and mas_*_range() only covert mas to vmi. It's > simple. > > But the low/high also covert max to max - 1. It's a little more complex. Ah, so the code doesn't match the comment, since the code will still use mas directly in this version. This was, perhaps, the largest issue with the patch. Having a good patch log is very important as people rely on it during reviews, but more importantly when tracking down an issue later on. I like the idea of removing as many mas uses as feasible, but we will still have a few that must be passed through, so please change the wording. > > > > > Leave the mt_detach tree keep using mas, as it doesn't come from the mm > > > > struct. > > Well, it's still VMAs from the mm struct. I agree that we should not > > change this for now. > > > > > > Convert all mas except mas_detach to vma iterator. And introduce > > > > vma_iter_area_{lowest, highest} helper functions for use vma interator. > > Do you mean mas functions? You do pass the maple state through to other > > areas. ie: free_pgtables(). > > > Yes. ... > > > > retry: > > > > - if (mas_empty_area_rev(&mas, low_limit, high_limit - 1, length)) > > > > + if (vma_iter_area_highest(&vmi, low_limit, high_limit, length)) > > > > return -ENOMEM; > > > > - gap = mas.last + 1 - info->length; > > > > + gap = vma_iter_end(&vmi) - info->length; > > > > gap -= (gap - info->align_offset) & info->align_mask; > > > > - gap_end = mas.last; > > > > - tmp = mas_next(&mas, ULONG_MAX); > > > > + gap_end = vma_iter_end(&vmi); > > vma_iter_end will return vmi->mas.last + 1, is what you have here > > correct? > > > > Yes, the following changes 'if (vm_start_gap(tmp) <= gap_end)' to 'if > (vm_start_gap(tmp) < gap_end)'. > > > > > + tmp = vma_next(&vmi); > > > > if (tmp && (tmp->vm_flags & VM_STARTGAP_FLAGS)) { /* Avoid prev check if possible */ > > > > - if (vm_start_gap(tmp) <= gap_end) { > > > > + if (vm_start_gap(tmp) < gap_end) { Thanks. This works and the variable isn't used again. ... > > > > @@ -1959,11 +1958,12 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address) > > > > struct vm_area_struct *next; > > > > unsigned long gap_addr; > > > > int error = 0; > > > > - MA_STATE(mas, &mm->mm_mt, vma->vm_start, address); > > > > + VMA_ITERATOR(vmi, mm, 0); > > > > if (!(vma->vm_flags & VM_GROWSUP)) > > > > return -EFAULT; > > > > + vma_iter_config(&vmi, vma->vm_start, address); > > This is confusing. I think you are doing this so that the vma iterator > > is set up the same as the maple state, and not what is logically > > necessary? > > > Yes, VMA_ITERATOR can only pass one address. > > > > > > > /* Guard against exceeding limits of the address space. */ > > > > address &= PAGE_MASK; > > > > if (address >= (TASK_SIZE & PAGE_MASK)) > > > > @@ -1985,15 +1985,15 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address) > > > > } > > > > if (next) > > > > - mas_prev_range(&mas, address); > > > > + mas_prev_range(&vmi.mas, address); > > This isn't really hiding the maple state. > > > Okay, I will create a new helper function for this in the mm/internal.h. > > > > > > > > > - __mas_set_range(&mas, vma->vm_start, address - 1); > > > > - if (mas_preallocate(&mas, vma, GFP_KERNEL)) > > > > + vma_iter_config(&vmi, vma->vm_start, address); > > The above maple state changes is to get the maple state to point to the > > correct area for the preallocation call below. This seems unnecessary > > to me. > > > > We really should just set it up correctly. Unfortunately, with the VMA > > iterator, that's not really possible on initialization. > > > > What we can do is use the vma->vm_start for the initialization, then use > > vma_iter_config() here. That will not reset any state - but that's fine > > because the preallocation is the first call that actually uses it > > anyways. > > > > So we can initialize with vma->vm_start, don't call vma_iter_config > > until here, and also drop the if (next) part. > > > > This is possible here because it's not optimised like the > > expand_upwards() case, which uses the state to check prev and avoids an > > extra walk. > > > > Please make sure to test with the ltp tests on the stack combining, etc > > on a platform that expands down. > > > Okay, I will test it. Testing this can be tricky. Thanks for looking at it. ... > > > > mmap_write_lock(mm); > > > > mt_clear_in_rcu(&mm->mm_mt); > > > > - mas_set(&mas, vma->vm_end); > > > > - free_pgtables(&tlb, &mas, vma, FIRST_USER_ADDRESS, > > > > + vma_iter_set(&vmi, vma->vm_end); > > > > + free_pgtables(&tlb, &vmi.mas, vma, FIRST_USER_ADDRESS, > > > > USER_PGTABLES_CEILING, true); > > I guess the page tables still deal with the maple state directly then. > > > Yes. That's okay, we can leave that for another time. I believe Peng complicated the internals of free_pgtables() with his forking optimisation so care will need to be taken and should probably be done in another patch, another time. In fact, hiding xa_is_zero() within the vma iterator is going to be a really good thing to do, if performance doesn't suffer. Just don't state we are removing the use of maple state in the change log - since we do pass it through sometimes. ... Thanks again, Liam