RE: [PATCH v2 3/4] mm: refactor do_wp_page, extract the page copy flow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: owner-linux-mm@xxxxxxxxx [mailto:owner-linux-mm@xxxxxxxxx] On
> Behalf Of Rik van Riel
> Sent: Tuesday, December 02, 2014 4:53 AM
> To: Shachar Raindel; linux-mm@xxxxxxxxx
> Cc: kirill.shutemov@xxxxxxxxxxxxxxx; mgorman@xxxxxxx;
> ak@xxxxxxxxxxxxxxx; matthew.r.wilcox@xxxxxxxxx;
> dave.hansen@xxxxxxxxxxxxxxx; n-horiguchi@xxxxxxxxxxxxx; akpm@linux-
> foundation.org; torvalds@xxxxxxxxxxxxxxxxxxxx; Haggai Eran;
> aarcange@xxxxxxxxxx; pfeiner@xxxxxxxxxx; hannes@xxxxxxxxxxx; Sagi
> Grimberg; walken@xxxxxxxxxx
> Subject: Re: [PATCH v2 3/4] mm: refactor do_wp_page, extract the page
> copy flow
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 12/01/2014 03:58 PM, Shachar Raindel wrote:
> > In some cases, do_wp_page had to copy the page suffering a write
> > fault to a new location. If the function logic decided that to do
> > this, it was done by jumping with a "goto" operation to the
> > relevant code block. This made the code really hard to understand.
> > It is also against the kernel coding style guidelines.
> >
> > This patch extracts the page copy and page table update logic to a
> > separate function. It also clean up the naming, from "gotten" to
> > "wp_page_copy", and adds few comments.
> >
> > Signed-off-by: Shachar Raindel <raindel@xxxxxxxxxxxx> ---
> > mm/memory.c | 265
> > +++++++++++++++++++++++++++++++++--------------------------- 1 file
> > changed, 147 insertions(+), 118 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c index b42bec0..c7c0df2
> > 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2088,6 +2088,146 @@
> > static int wp_page_reuse(struct mm_struct *mm, struct
> > vm_area_struct *vma, }
> >
> > /* + * Handle the case of a page which we actually need to copy to
> > a new page. + * + * Called with mmap_sem locked and the old page
> > referenced, but + * without the ptl held. + * + * High level logic
> > flow: + * + * - Allocate a page, copy the content of the old page
> > to the new one. + * - Handle book keeping and accounting - cgroups,
> > mmu-notifiers, etc. + * - Take the PTL. If the pte changed, bail
> > out and release the allocated page + * - If the pte is still the
> > way we remember it, update the page table and all + *   relevant
> > references. This includes dropping the reference the page-table + *
> > held to the old page, as well as updating the rmap. + * - In any
> > case, unlock the PTL and drop the reference we took to the old
> > page. + */ +static int wp_page_copy(struct mm_struct *mm, struct
> > vm_area_struct *vma, +			unsigned long address, pte_t
> *page_table,
> > pmd_t *pmd, +			pte_t orig_pte, struct page *old_page) +{ +
> 	struct
> > page *new_page = NULL; +	spinlock_t *ptl = NULL; +	pte_t entry; +
> > int page_copied = 0; +	const unsigned long mmun_start = address &
> > PAGE_MASK;	/* For mmu_notifiers */ +	const unsigned long mmun_end
> =
> > mmun_start + PAGE_SIZE;	/* For mmu_notifiers */ +	struct mem_cgroup
> > *memcg; + +	if (unlikely(anon_vma_prepare(vma))) +		goto
> oom; + +
> > if (is_zero_pfn(pte_pfn(orig_pte))) { +		new_page =
> > alloc_zeroed_user_highpage_movable(vma, address); +		if
> > (!new_page) +			goto oom; +	} else { +		new_page =
> > alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address); +		if
> > (!new_page) +			goto oom; +		cow_user_page(new_page,
> old_page,
> > address, vma); +	} +	__SetPageUptodate(new_page); + +	if
> > (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg)) +
> 	goto
> > oom_free_new; + +	mmu_notifier_invalidate_range_start(mm,
> > mmun_start, mmun_end);
> 
> I believe the mmu_notifier_invalidate_range_start & _end
> functions can be moved inside the pte_same(*page_table, orig_pte)
> branch. There is no reason to call those functions if we do not
> modify the page table entry.
> 

There is a critical reason for this - moving the MMU notifiers there will
make them unsleepable. This will prevent hardware devices that keep secondary
PTEs from waiting for an interrupt to signal that the invalidation was completed. 
This is required for example by the ODP patch set 
(http://www.spinics.net/lists/linux-rdma/msg22044.html ) and
by the HMM patch set (http://comments.gmane.org/gmane.linux.kernel.mm/116584 ).

> This is not something introduced by your patch, but you might as
> well fix it while you're touching the code :)

This happened to be introduced by Haggai Eran 
(http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6bdb913f0a ),
who kindly reviewed this patchset before v0 was sent.

As mentioned above, I would prefer not to break the sleepability of the MMU notifiers in
this patchset which is targeting coding style cleanup. If you want to further discuss the
issue, can we split it to a different thread, not blocking the ack on this patch?

> 
> Other than that:
> 
> Acked-by: Rik van Riel <riel@xxxxxxxxxx>
> 
Thanks.

--Shachar
��.n������g����a����&ޖ)���)��h���&������梷�����Ǟ�m������)������^�����������v���O��zf������





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]