Adding Mel to Cc. On Mon, 24 Aug 2015, akpm@xxxxxxxxxxxxxxxxxxxx wrote: > > The patch titled > Subject: mm: fix potential data race in SyS_swapon > has been removed from the -mm tree. Its filename was > mm-fix-potential-data-race-in-sys_swapon.patch > > This patch was dropped because it was merged into mainline or a subsystem tree Administrative error? I don't see this merged into mainline yet, and didn't see your usual mail when you send in a batch to Linus. And I wouldn't want it rushed too quickly to Linus: that stable tag is barely justified, this is a very narrow race window that has gone unnoticed for years, and swapon requires CAP_SYS_ADMIN. But also I spotted Mel proposing a swap-over-NFS patch in this area on LKML last Thursday: he appeared to be relying on the loop that I remove here, so he might want to veto this one (though can always reinstate what he needs later, if that's how it plays out). Hugh > > ------------------------------------------------------ > From: Hugh Dickins <hughd@xxxxxxxxxx> > Subject: mm: fix potential data race in SyS_swapon > > While running KernelThreadSanitizer (ktsan) on upstream kernel with > trinity, we got a few reports from SyS_swapon, here is one of them: > > Read of size 8 by thread T307 (K7621): > [< inlined >] SyS_swapon+0x3c0/0x1850 SYSC_swapon mm/swapfile.c:2395 > [<ffffffff812242c0>] SyS_swapon+0x3c0/0x1850 mm/swapfile.c:2345 > [<ffffffff81e97c8a>] ia32_do_call+0x1b/0x25 > > Looks like the swap_lock should be taken when iterating through the > swap_info array on lines 2392 - 2401: q->swap_file may be reset to NULL by > another thread before it is dereferenced for f_mapping. > > But why is that iteration needed at all? Doesn't the claim_swapfile() > which follows do all that is needed to check for a duplicate entry - > FMODE_EXCL on a bdev, testing IS_SWAPFILE under i_mutex on a regfile? > > Well, not quite: bd_may_claim() allows the same "holder" to claim the bdev > again, so we do need to use a different holder than "sys_swapon"; and we > should not replace appropriate -EBUSY by inappropriate -EINVAL. > > Index i was reused in a cpu loop further down: renamed cpu there. > > Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> > Reported-by: Andrey Konovalov <andreyknvl@xxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxx> > Cc: Johannes Weiner <hannes@xxxxxxxxxxx> > Cc: Vladimir Davydov <vdavydov@xxxxxxxxxxxxx> > Cc: Jason Low <jason.low2@xxxxxx> > Cc: Cesar Eduardo Barros <cesarb@xxxxxxxxxx> > Cc: Dmitry Vyukov <dvyukov@xxxxxxxxxx> > Cc: Kostya Serebryany <kcc@xxxxxxxxxx> > Cc: Alexander Potapenko <glider@xxxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > --- > > mm/swapfile.c | 25 +++++++------------------ > 1 file changed, 7 insertions(+), 18 deletions(-) > > diff -puN mm/swapfile.c~mm-fix-potential-data-race-in-sys_swapon mm/swapfile.c > --- a/mm/swapfile.c~mm-fix-potential-data-race-in-sys_swapon > +++ a/mm/swapfile.c > @@ -2185,11 +2185,10 @@ static int claim_swapfile(struct swap_in > if (S_ISBLK(inode->i_mode)) { > p->bdev = bdgrab(I_BDEV(inode)); > error = blkdev_get(p->bdev, > - FMODE_READ | FMODE_WRITE | FMODE_EXCL, > - sys_swapon); > + FMODE_READ | FMODE_WRITE | FMODE_EXCL, p); > if (error < 0) { > p->bdev = NULL; > - return -EINVAL; > + return error; > } > p->old_block_size = block_size(p->bdev); > error = set_blocksize(p->bdev, PAGE_SIZE); > @@ -2390,7 +2389,6 @@ SYSCALL_DEFINE2(swapon, const char __use > struct filename *name; > struct file *swap_file = NULL; > struct address_space *mapping; > - int i; > int prio; > int error; > union swap_header *swap_header; > @@ -2430,19 +2428,8 @@ SYSCALL_DEFINE2(swapon, const char __use > > p->swap_file = swap_file; > mapping = swap_file->f_mapping; > - > - for (i = 0; i < nr_swapfiles; i++) { > - struct swap_info_struct *q = swap_info[i]; > - > - if (q == p || !q->swap_file) > - continue; > - if (mapping == q->swap_file->f_mapping) { > - error = -EBUSY; > - goto bad_swap; > - } > - } > - > inode = mapping->host; > + > /* If S_ISREG(inode->i_mode) will do mutex_lock(&inode->i_mutex); */ > error = claim_swapfile(p, inode); > if (unlikely(error)) > @@ -2475,6 +2462,8 @@ SYSCALL_DEFINE2(swapon, const char __use > goto bad_swap; > } > if (p->bdev && blk_queue_nonrot(bdev_get_queue(p->bdev))) { > + int cpu; > + > p->flags |= SWP_SOLIDSTATE; > /* > * select a random position to start with to help wear leveling > @@ -2493,9 +2482,9 @@ SYSCALL_DEFINE2(swapon, const char __use > error = -ENOMEM; > goto bad_swap; > } > - for_each_possible_cpu(i) { > + for_each_possible_cpu(cpu) { > struct percpu_cluster *cluster; > - cluster = per_cpu_ptr(p->percpu_cluster, i); > + cluster = per_cpu_ptr(p->percpu_cluster, cpu); > cluster_set_null(&cluster->index); > } > } > _ > > Patches currently in -mm which might be from hughd@xxxxxxxxxx are > > mm-vmscan-unlock-page-while-waiting-on-writeback.patch -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html