Re: [patch 3/4 v4]swap: fix races exposed by swap discard

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 26, 2013 at 01:38:27PM +0800, Shaohua Li wrote:
> Last patch can expose races, according to Hugh:
> 
> swapoff was sometimes failing with "Cannot allocate memory", coming from
> try_to_unuse()'s -ENOMEM: it needs to allow for swap_duplicate() failing on a
> free entry temporarily SWAP_MAP_BAD while being discarded.
> 
> We should use ACCESS_ONCE() there, and whenever accessing swap_map locklessly;
> but rather than peppering it throughout try_to_unuse(), just declare *swap_map
> with volatile.
> 
> try_to_unuse() is accustomed to *swap_map going down racily, but not
> necessarily to it jumping up from 0 to SWAP_MAP_BAD: we'll be safer to prevent
> that transition once SWP_WRITEOK is switched off, when it's a waste of time to
> issue discards anyway (swapon can do a whole discard).
> 
> Another issue is:
> 
> In swapin_readahead(), read_swap_cache_async() can read a bad swap entry,
> because we don't check if readahead swap entry is bad. This doesn't break
> anything but such swapin page is wasteful and can only be freed at page
> reclaim. We avoid read such swap entry.
> 
> And next patch will mark a swap entry bad temporarily for discard. Without this
> patch, swap entry count will be messed.
> 
> Thanks Hugh to inspire swapin_readahead could use bad swap entry.
> 
> [include Hugh's patch 'swap: fix swapoff ENOMEMs from discard']
> Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
> Signed-off-by: Shaohua Li <shli@xxxxxxxxxxxx>
> ---

Acked-by: Rafael Aquini <aquini@xxxxxxxxxx>


>  mm/swapfile.c |   15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> Index: linux/mm/swapfile.c
> ===================================================================
> --- linux.orig/mm/swapfile.c	2013-03-22 17:28:06.000000000 +0800
> +++ linux/mm/swapfile.c	2013-03-22 17:40:51.580356594 +0800
> @@ -331,7 +331,8 @@ static inline void dec_cluster_info_page
>  		 * instead of free it immediately. The cluster will be freed
>  		 * after discard.
>  		 */
> -		if (p->flags & SWP_DISCARDABLE) {
> +		if ((p->flags & (SWP_WRITEOK | SWP_DISCARDABLE)) ==
> +				 (SWP_WRITEOK | SWP_DISCARDABLE)) {
>  			swap_cluster_schedule_discard(p, idx);
>  			return;
>  		}
> @@ -1228,7 +1229,7 @@ static unsigned int find_next_to_unuse(s
>  			else
>  				continue;
>  		}
> -		count = si->swap_map[i];
> +		count = ACCESS_ONCE(si->swap_map[i]);
>  		if (count && swap_count(count) != SWAP_MAP_BAD)
>  			break;
>  	}
> @@ -1248,7 +1249,7 @@ int try_to_unuse(unsigned int type, bool
>  {
>  	struct swap_info_struct *si = swap_info[type];
>  	struct mm_struct *start_mm;
> -	unsigned char *swap_map;
> +	volatile unsigned char *swap_map;	/* ACCESS_ONCE throughout */
>  	unsigned char swcount;
>  	struct page *page;
>  	swp_entry_t entry;
> @@ -1299,7 +1300,8 @@ int try_to_unuse(unsigned int type, bool
>  			 * reused since sys_swapoff() already disabled
>  			 * allocation from here, or alloc_page() failed.
>  			 */
> -			if (!*swap_map)
> +			swcount = *swap_map;
> +			if (!swcount || swcount == SWAP_MAP_BAD)
>  				continue;
>  			retval = -ENOMEM;
>  			break;
> @@ -2432,6 +2434,11 @@ static int __swap_duplicate(swp_entry_t
>  		goto unlock_out;
>  
>  	count = p->swap_map[offset];
> +	if (unlikely(swap_count(count) == SWAP_MAP_BAD)) {
> +		err = -ENOENT;
> +		goto unlock_out;
> +	}
> +
>  	has_cache = count & SWAP_HAS_CACHE;
>  	count &= ~SWAP_HAS_CACHE;
>  	err = 0;
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]