Re: [merged] mm-fix-potential-data-race-in-sys_swapon.patch removed from -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Adding Mel to Cc.

On Mon, 24 Aug 2015, akpm@xxxxxxxxxxxxxxxxxxxx wrote:
> 
> The patch titled
>      Subject: mm: fix potential data race in SyS_swapon
> has been removed from the -mm tree.  Its filename was
>      mm-fix-potential-data-race-in-sys_swapon.patch
> 
> This patch was dropped because it was merged into mainline or a subsystem tree

Administrative error?  I don't see this merged into mainline yet,
and didn't see your usual mail when you send in a batch to Linus.

And I wouldn't want it rushed too quickly to Linus: that stable
tag is barely justified, this is a very narrow race window that
has gone unnoticed for years, and swapon requires CAP_SYS_ADMIN.

But also I spotted Mel proposing a swap-over-NFS patch in this area
on LKML last Thursday: he appeared to be relying on the loop that I
remove here, so he might want to veto this one (though can always
reinstate what he needs later, if that's how it plays out).

Hugh

> 
> ------------------------------------------------------
> From: Hugh Dickins <hughd@xxxxxxxxxx>
> Subject: mm: fix potential data race in SyS_swapon
> 
> While running KernelThreadSanitizer (ktsan) on upstream kernel with
> trinity, we got a few reports from SyS_swapon, here is one of them:
> 
> Read of size 8 by thread T307 (K7621):
>  [<     inlined    >] SyS_swapon+0x3c0/0x1850 SYSC_swapon mm/swapfile.c:2395
>  [<ffffffff812242c0>] SyS_swapon+0x3c0/0x1850 mm/swapfile.c:2345
>  [<ffffffff81e97c8a>] ia32_do_call+0x1b/0x25
> 
> Looks like the swap_lock should be taken when iterating through the
> swap_info array on lines 2392 - 2401: q->swap_file may be reset to NULL by
> another thread before it is dereferenced for f_mapping.
> 
> But why is that iteration needed at all?  Doesn't the claim_swapfile()
> which follows do all that is needed to check for a duplicate entry -
> FMODE_EXCL on a bdev, testing IS_SWAPFILE under i_mutex on a regfile?
> 
> Well, not quite: bd_may_claim() allows the same "holder" to claim the bdev
> again, so we do need to use a different holder than "sys_swapon"; and we
> should not replace appropriate -EBUSY by inappropriate -EINVAL.
> 
> Index i was reused in a cpu loop further down: renamed cpu there.
> 
> Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
> Reported-by: Andrey Konovalov <andreyknvl@xxxxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> Cc: Vladimir Davydov <vdavydov@xxxxxxxxxxxxx>
> Cc: Jason Low <jason.low2@xxxxxx>
> Cc: Cesar Eduardo Barros <cesarb@xxxxxxxxxx>
> Cc: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
> Cc: Kostya Serebryany <kcc@xxxxxxxxxx>
> Cc: Alexander Potapenko <glider@xxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> ---
> 
>  mm/swapfile.c |   25 +++++++------------------
>  1 file changed, 7 insertions(+), 18 deletions(-)
> 
> diff -puN mm/swapfile.c~mm-fix-potential-data-race-in-sys_swapon mm/swapfile.c
> --- a/mm/swapfile.c~mm-fix-potential-data-race-in-sys_swapon
> +++ a/mm/swapfile.c
> @@ -2185,11 +2185,10 @@ static int claim_swapfile(struct swap_in
>  	if (S_ISBLK(inode->i_mode)) {
>  		p->bdev = bdgrab(I_BDEV(inode));
>  		error = blkdev_get(p->bdev,
> -				   FMODE_READ | FMODE_WRITE | FMODE_EXCL,
> -				   sys_swapon);
> +				   FMODE_READ | FMODE_WRITE | FMODE_EXCL, p);
>  		if (error < 0) {
>  			p->bdev = NULL;
> -			return -EINVAL;
> +			return error;
>  		}
>  		p->old_block_size = block_size(p->bdev);
>  		error = set_blocksize(p->bdev, PAGE_SIZE);
> @@ -2390,7 +2389,6 @@ SYSCALL_DEFINE2(swapon, const char __use
>  	struct filename *name;
>  	struct file *swap_file = NULL;
>  	struct address_space *mapping;
> -	int i;
>  	int prio;
>  	int error;
>  	union swap_header *swap_header;
> @@ -2430,19 +2428,8 @@ SYSCALL_DEFINE2(swapon, const char __use
>  
>  	p->swap_file = swap_file;
>  	mapping = swap_file->f_mapping;
> -
> -	for (i = 0; i < nr_swapfiles; i++) {
> -		struct swap_info_struct *q = swap_info[i];
> -
> -		if (q == p || !q->swap_file)
> -			continue;
> -		if (mapping == q->swap_file->f_mapping) {
> -			error = -EBUSY;
> -			goto bad_swap;
> -		}
> -	}
> -
>  	inode = mapping->host;
> +
>  	/* If S_ISREG(inode->i_mode) will do mutex_lock(&inode->i_mutex); */
>  	error = claim_swapfile(p, inode);
>  	if (unlikely(error))
> @@ -2475,6 +2462,8 @@ SYSCALL_DEFINE2(swapon, const char __use
>  		goto bad_swap;
>  	}
>  	if (p->bdev && blk_queue_nonrot(bdev_get_queue(p->bdev))) {
> +		int cpu;
> +
>  		p->flags |= SWP_SOLIDSTATE;
>  		/*
>  		 * select a random position to start with to help wear leveling
> @@ -2493,9 +2482,9 @@ SYSCALL_DEFINE2(swapon, const char __use
>  			error = -ENOMEM;
>  			goto bad_swap;
>  		}
> -		for_each_possible_cpu(i) {
> +		for_each_possible_cpu(cpu) {
>  			struct percpu_cluster *cluster;
> -			cluster = per_cpu_ptr(p->percpu_cluster, i);
> +			cluster = per_cpu_ptr(p->percpu_cluster, cpu);
>  			cluster_set_null(&cluster->index);
>  		}
>  	}
> _
> 
> Patches currently in -mm which might be from hughd@xxxxxxxxxx are
> 
> mm-vmscan-unlock-page-while-waiting-on-writeback.patch
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]