On Fri, 4 Nov 2011, Nai Xia wrote: > On Fri, Nov 4, 2011 at 3:31 PM, Hugh Dickins <hughd@xxxxxxxxxx> wrote: > > On Mon, 31 Oct 2011, Andrea Arcangeli wrote: > >> @@ -2339,7 +2339,15 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, > >> */ > >> if (vma_start >= new_vma->vm_start && > >> vma_start < new_vma->vm_end) > >> + /* > >> + * No need to call anon_vma_order_tail() in > >> + * this case because the same PT lock will > >> + * serialize the rmap_walk against both src > >> + * and dst vmas. > >> + */ > > > > Really? Please convince me: I just do not see what ensures that > > the same pt lock covers both src and dst areas in this case. > > At the first glance that rmap_walk does travel this merged VMA > once... > But, Now, Wait...., I am actually really puzzled that this case can really > happen at all, you see that vma_merge() does not break the validness > between page->index and its VMA. So if this can really happen, > a page->index should be valid in both areas in a same VMA. > It's strange to imagine that a PTE is copy inside a _same_ VMA > and page->index is valid at both old and new places. Yes, I think you are right, thank you for elucidating it. That was a real case when we wrote copy_vma(), when rmap was using pte_chains; but once anon_vma came in, and imposed vm_pgoff matching on anonymous mappings too, it became dead code. With linear vm_pgoff matching, you cannot fit a range in two places within the same vma. (And even the non-linear case relies upon vm_pgoff defaults.) So we could simplify the copy_vma() interface a little now (get rid of that nasty **vmap): I'm not quite sure whether we ought to do that, but certainly Andrea's comment there should be updated (if he also agrees with your analysis). > > IMO, the only case that src VMA can be merged by the new > is that src VMA hasn't been faulted yet and the pgoff > is recalculated. And if my reasoning is true, this place > does not need to be worried about. I don't see a place where "the pgoff is recalculated" (except in the consistent way when expanding or splitting or merging vma), nor where vma merge would allow for variable pgoff. I agree that we could avoid finalizing vm_pgoff for an anonymous area until its anon_vma is assigned: were you imagining doing that in future, or am I overlooking something already there? Hugh