Re: [PATCH v1 2/3] mm/gup: use gup_can_follow_protnone() also in GUP-fast

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/30/22 11:53, David Hildenbrand wrote:
> Good, I managed to attract the attention of someone who understands that machinery :)
> 
> While validating whether GUP-fast and PageAnonExclusive code work correctly,
> I started looking at the whole RCU GUP-fast machinery. I do have a patch to
> improve PageAnonExclusive clearing (I think we're missing memory barriers to
> make it work as expected in any possible case), but I also stumbled eventually
> over a more generic issue that might need memory barriers.
> 
> Any thoughts whether I am missing something or this is actually missing
> memory barriers?
> 

It's actually missing memory barriers.

In fact, others have had that same thought! [1] :) In that 2019 thread,
I recall that this got dismissed because of a focus on the IPI-based
aspect of gup fast synchronization (there was some hand waving, perhaps
accurate waving, about memory barriers vs. CPU interrupts). But now the
RCU (non-IPI) implementation is more widely used than it used to be, the
issue is clearer.

> 
> From ce8c941c11d1f60cea87a3e4d941041dc6b79900 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@xxxxxxxxxx>
> Date: Mon, 29 Aug 2022 16:57:07 +0200
> Subject: [PATCH] mm/gup: update refcount+pincount before testing if the PTE
>  changed
> 
> mm/ksm.c:write_protect_page() has to make sure that no unknown
> references to a mapped page exist and that no additional ones with write
> permissions are possible -- unknown references could have write permissions
> and modify the page afterwards.
> 
> Conceptually, mm/ksm.c:write_protect_page() consists of:
>   (1) Clear/invalidate PTE
>   (2) Check if there are unknown references; back off if so.
>   (3) Update PTE (e.g., map it R/O)
> 
> Conceptually, GUP-fast code consists of:
>   (1) Read the PTE
>   (2) Increment refcount/pincount of the mapped page
>   (3) Check if the PTE changed by re-reading it; back off if so.
> 
> To make sure GUP-fast won't be able to grab additional references after
> clearing the PTE, but will properly detect the change and back off, we
> need a memory barrier between updating the recount/pincount and checking
> if it changed.
> 
> try_grab_folio() doesn't necessarily imply a memory barrier, so add an
> explicit smp_mb__after_atomic() after the atomic RMW operation to
> increment the refcount and pincount.
> 
> ptep_clear_flush() used to clear the PTE and flush the TLB should imply
> a memory barrier for flushing the TLB, so don't add another one for now.
> 
> PageAnonExclusive handling requires further care and will be handled
> separately.
> 
> Fixes: 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()")
> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
> ---
>  mm/gup.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 5abdaf487460..0008b808f484 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2392,6 +2392,14 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>  			goto pte_unmap;
>  		}
>  
> +		/*
> +		 * Update refcount/pincount before testing for changed PTE. This
> +		 * is required for code like mm/ksm.c:write_protect_page() that
> +		 * wants to make sure that a page has no unknown references
> +		 * after clearing the PTE.
> +		 */
> +		smp_mb__after_atomic();
> +
>  		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
>  			gup_put_folio(folio, 1, flags);
>  			goto pte_unmap;
> @@ -2577,6 +2585,9 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
>  	if (!folio)
>  		return 0;
>  
> +	/* See gup_pte_range(). */

Don't we usually also identify what each mb pairs with, in the comments? That would help.

> +	smp_mb__after_atomic();
> +
>  	if (unlikely(pte_val(pte) != pte_val(*ptep))) {
>  		gup_put_folio(folio, refs, flags);
>  		return 0;
> @@ -2643,6 +2654,9 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
>  	if (!folio)
>  		return 0;
>  
> +	/* See gup_pte_range(). */
> +	smp_mb__after_atomic();
> +
>  	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
>  		gup_put_folio(folio, refs, flags);
>  		return 0;
> @@ -2683,6 +2697,9 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
>  	if (!folio)
>  		return 0;
>  
> +	/* See gup_pte_range(). */
> +	smp_mb__after_atomic();
> +
>  	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
>  		gup_put_folio(folio, refs, flags);
>  		return 0;


[1] https://lore.kernel.org/lkml/9465df76-0229-1b44-5646-5cced1bc1718@xxxxxxxxxx/


thanks,

-- 
John Hubbard
NVIDIA




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux