Re: [PATCH -rt] avoid deadlock related with PG_nonewrefs and swap_lock

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Wed, 26 Mar 2008 10:50:12 +0100

On Mon, 2008-03-24 at 11:24 -0700, Hiroshi Shimamoto wrote:
> Hi Peter,
> 
> I've updated the patch. Could you please review it?
> 
> I'm also thinking that it can be in the mainline because it makes
> the lock period shorter, correct?

Possibly yeah, Nick, Hugh?

> ---
> From: Hiroshi Shimamoto <h-shimamoto@xxxxxxxxxxxxx>
> 
> There is a deadlock scenario; remove_mapping() vs free_swap_and_cache().
> remove_mapping() turns PG_nonewrefs bit on, then locks swap_lock.
> free_swap_and_cache() locks swap_lock, then wait to turn PG_nonewrefs bit
> off in find_get_page().
> 
> swap_lock can be unlocked before calling find_get_page().
> 
> In remove_exclusive_swap_page(), there is similar lock sequence;
> swap_lock, then PG_nonewrefs bit. swap_lock can be unlocked before
> turning PG_nonewrefs bit on.

I worry about this, Once we free the swap entry with swap_entry_free(),
and drop the swap_lock, another task is basically free to re-use that
swap location and try to insert another page in that same spot in
add_to_swap() - read_swap_cache_async() can't race because it would mean
it still has a swap entry pinned.

However, add_to_swap() can already handle the race, because it used to
race against read_swap_cache_async(). It also swap_free()s the entry so
as to not leak entries. So I think this is indeed correct.

[ I ought to find some time to port the concurrent page-cache patches on
  top of Nick's latest lockless series, Hugh's suggestion makes the
  speculative get much nicer. ]

> Signed-off-by: Hiroshi Shimamoto <h-shimamoto@xxxxxxxxxxxxx>

Acked-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>

> ---
>  mm/swapfile.c |   10 ++++++----
>  1 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 5036b70..6fbc77e 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -366,6 +366,7 @@ int remove_exclusive_swap_page(struct page *page)
>  	/* Is the only swap cache user the cache itself? */
>  	retval = 0;
>  	if (p->swap_map[swp_offset(entry)] == 1) {
> +		spin_unlock(&swap_lock);
>  		/* Recheck the page count with the swapcache lock held.. */
>  		lock_page_ref_irq(page);
>  		if ((page_count(page) == 2) && !PageWriteback(page)) {
> @@ -374,8 +375,8 @@ int remove_exclusive_swap_page(struct page *page)
>  			retval = 1;
>  		}
>  		unlock_page_ref_irq(page);
> -	}
> -	spin_unlock(&swap_lock);
> +	} else
> +		spin_unlock(&swap_lock);
>  
>  	if (retval) {
>  		swap_free(entry);
> @@ -400,13 +401,14 @@ void free_swap_and_cache(swp_entry_t entry)
>  	p = swap_info_get(entry);
>  	if (p) {
>  		if (swap_entry_free(p, swp_offset(entry)) == 1) {
> +			spin_unlock(&swap_lock);
>  			page = find_get_page(&swapper_space, entry.val);
>  			if (page && unlikely(TestSetPageLocked(page))) {
>  				page_cache_release(page);
>  				page = NULL;
>  			}
> -		}
> -		spin_unlock(&swap_lock);
> +		} else
> +			spin_unlock(&swap_lock);
>  	}
>  	if (page) {
>  		int one_user;

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html