On 2020-01-08 at 10:32 Wei Yang wrote: >On Tue, Jan 07, 2020 at 01:19:56PM +0300, Konstantin Khlebnikov wrote: >>This fixes some misconceptions in commit 4e4a9eb92133 ("mm/rmap.c: reuse >>mergeable anon_vma as parent when fork"). It merges anon-vma in unexpected >>way but fortunately still produces valid anon-vma tree, so nothing crashes. >> >>If in parent VMAs: SRC1 SRC2 .. SRCn share anon-vma ANON0, then after fork >>before all patches in child process related VMAs: DST1 DST2 .. DSTn will >>fork indepndent anon-vmas: ANON1 ANON2 .. ANONn (each is child of ANON0). >>Before this patch only DST1 will fork new ANON1 and following DST2 .. DSTn >>will share parent's ANON0 (i.e. anon-vma tree is valid but isn't optimal). >>With this patch DST1 will create new ANON1 and DST2 .. DSTn will share it. >> >>Root problem caused by initialization order in dup_mmap(): vma->vm_prev >>is set after calling anon_vma_fork(). Thus in anon_vma_fork() it points to >>previous VMA in parent mm. >> >>Second problem is hidden behind first one: assumption "Parent has vm_prev, >>which implies we have vm_prev" is wrong if first VMA in parent mm has set >>flag VM_DONTCOPY. Luckily prev->anon_vma doesn't dereference NULL pointer >>because in current code 'prev' actually is same as 'pprev'. >> >>Third hidden problem is linking between VMA and anon-vmas whose pages it >>could contain. Loop in anon_vma_clone() attaches only parent's anon-vmas, >>shared anon-vma isn't attached. But every mapped page stays reachable in >>rmap because we erroneously share anon-vma from parent's previous VMA. >> >>This patch moves sharing logic out of anon_vma_clone() into more specific >>anon_vma_fork() because this supposed to work only at fork() and simply >>reuses anon_vma from previous VMA if it is forked from the same anon-vma. >> >>Signed-off-by: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx> >>Reported-by: Li Xinhai <lixinhai.lxh@xxxxxxxxx> >>Fixes: 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork") >>Link: https://lore.kernel.org/linux-mm/CALYGNiNzz+dxHX0g5-gNypUQc3B=8_Scp53-NTOh=zWsdUuHAw@xxxxxxxxxxxxxx/T/#t >>--- >> include/linux/rmap.h | 3 ++- >> kernel/fork.c | 2 +- >> mm/rmap.c | 23 +++++++++-------------- >> 3 files changed, 12 insertions(+), 16 deletions(-) >> >>diff --git a/include/linux/rmap.h b/include/linux/rmap.h >>index 988d176472df..560e4480dcd0 100644 >>--- a/include/linux/rmap.h >>+++ b/include/linux/rmap.h >>@@ -143,7 +143,8 @@ void anon_vma_init(void); /* create anon_vma_cachep */ >> int __anon_vma_prepare(struct vm_area_struct *); >> void unlink_anon_vmas(struct vm_area_struct *); >> int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *); >>-int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *); >>+int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma, >>+ struct vm_area_struct *prev); >> >> static inline int anon_vma_prepare(struct vm_area_struct *vma) >> { >>diff --git a/kernel/fork.c b/kernel/fork.c >>index 2508a4f238a3..c33626993831 100644 >>--- a/kernel/fork.c >>+++ b/kernel/fork.c >>@@ -556,7 +556,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, >> tmp->anon_vma = NULL; >> if (anon_vma_prepare(tmp)) >> goto fail_nomem_anon_vma_fork; >>- } else if (anon_vma_fork(tmp, mpnt)) >>+ } else if (anon_vma_fork(tmp, mpnt, prev)) >> goto fail_nomem_anon_vma_fork; >> tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT); >> tmp->vm_next = tmp->vm_prev = NULL; >>diff --git a/mm/rmap.c b/mm/rmap.c >>index b3e381919835..3c1e04389291 100644 >>--- a/mm/rmap.c >>+++ b/mm/rmap.c >>@@ -269,19 +269,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src) >> { >> struct anon_vma_chain *avc, *pavc; >> struct anon_vma *root = NULL; >>- struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev; >>- >>- /* >>- * If parent share anon_vma with its vm_prev, keep this sharing in in >>- * child. >>- * >>- * 1. Parent has vm_prev, which implies we have vm_prev. >>- * 2. Parent and its vm_prev have the same anon_vma. >>- */ >>- if (!dst->anon_vma && src->anon_vma && >>- pprev && pprev->anon_vma == src->anon_vma) >>- dst->anon_vma = prev->anon_vma; >>- >> >> list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) { >> struct anon_vma *anon_vma; >>@@ -332,7 +319,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src) >> * the corresponding VMA in the parent process is attached to. >> * Returns 0 on success, non-zero on failure. >> */ >>-int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma) >>+int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma, >>+ struct vm_area_struct *prev) >> { >> struct anon_vma_chain *avc; >> struct anon_vma *anon_vma; >>@@ -342,6 +330,13 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma) >> if (!pvma->anon_vma) >> return 0; >> >>+ /* Share anon_vma with previous VMA if it has the same parent. */ >>+ if (prev && prev->anon_vma && >>+ prev->anon_vma->parent == pvma->anon_vma) { >>+ vma->anon_vma = prev->anon_vma; >>+ return anon_vma_clone(vma, prev); >>+ } >>+ > the checking in first version has done right thing, or anything missed here? + if (pvma->anon_vma && pprev && prev && + pprev->anon_vma == pvma->anon_vma && + pprev->vm_start == prev->vm_start) except that 'pvma->anon_vma' has been checked at the begining of this function, it can be removed from this block. >I am afraid this one change the intended behavior. Let's put a chart to >describe. > >Commit 4e4a9eb92133 ("mm/rmap.c: reusemergeable anon_vma as parent when >fork") tries to improve the following situation. > >Before the commit, the behavior is like this: > >Parent process: > > +-----+ > | pav |<-----------------+----------------------+ > +-----+ | | > | | > +-----------+ +-----------+ > |pprev | |pvma | > +-----------+ +-----------+ > >Child Process > > > +-----+ +-----+ > | av1 |<-----------------+ | av2 |<------------+ > +-----+ | +-----+ | > | | > +-----------+ +-----------+ > |prev | |vma | > +-----------+ +-----------+ > > >Parent pprev and pvma share the same anon_vma due to >find_mergeable_anon_vma(). While the anon_vma_clone() would pick up different >anon_vma for child process's vma. > >The purpose of my commit is to give child process the following shape. > > +-----+ > | av |<-----------------+----------------------+ > +-----+ | | > | | > +-----------+ +-----------+ > |prev | |vma | > +-----------+ +-----------+ > >After this, we reduce the extra "av2" for child process. But yes, because of >the two reasons you found, it didn't do the exact thing. > >While if my understanding is correct, the anon_vma_clone() would pick up any >anon_vma in its process tree, except parent's. If this fails to get a reusable >one, anon_vma_fork() would allocate one, whose parent is pvma->anon_vma. > >Let me summarise original behavior: > > * if anon_vma_clone succeed, it find one anon_vma in the process tree, but > it could not be pvma->anon_vma > * if anon_vma_clone fail, it will allocate a new anon_vma and its parent is > pvma->anon_vam > >Then take a look into your code here. > >"prev->anon_vma->parent == pvma->anon_vma" means prev->anon_vma parent is >pvma's anon_vma. If my understanding is correct, this just match the second >case. For "prev", we didn't find a reusable anon_vma and allocate a new one. > >But how about the first case? prev reuse an anon_vma in the process tree which >is not parent's? > >> /* Drop inherited anon_vma, we'll reuse existing or allocate new. */ >> vma->anon_vma = NULL; >> > >-- >Wei Yang >Help you, Help me