Re: [Race] data race between blkdev_ioctl() and generic_fadvise()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We also want to provide more information about this data race and hope this could be helpful for you. 

------------------------------------------
Interleavings
With more experiments, we observed two interleavings of these two racy instructions.

Interleaving 1
Writer								Reader
								file->f_ra.ra_pages = bdi->ra_pages;
								// read value = 20
bdev->bd_bdi->ra_pages = (arg * 512) / PAGE_SIZE;
// write value = 0

Interleaving 2
bdev->bd_bdi->ra_pages = (arg * 512) / PAGE_SIZE;
// write value = 0
								file->f_ra.ra_pages = bdi->ra_pages;
								// read value = 0

In both interleavings, the read value would be passed to file->f_ra.ra_pages. However, in our test input, this variable was never touched after that, so we are not sure what impact it can have afterwards.

------------------------------------------
Test input
The concurrent input behind this data race is a pair of single-thread kernel input. We attach them in Syzkaller’s format here.
 
Input-1
r0 = syz_open_dev$loop(&(0x7f0000000000)='/dev/loop#\x00', 0x0, 0x0)
ioctl$BLKFRASET(r0, 0x1264, 0x0)

Input-2
r0 = syz_open_dev$loop(&(0x7f0000000180)='/dev/loop#\x00', 0x0, 0x0)
fadvise64(r0, 0x0, 0x4130122d, 0x5)


Thanks,
Sishuai

> On Nov 30, 2020, at 9:58 AM, Gong, Sishuai <sishuai@xxxxxxxxxx> wrote:
> 
> Hi,
> 
> We found a data race in linux kernel 5.3.11 that we are able to reproduce in x86 under specific interleavings. Currently, we are not sure about the consequence of this race so we would like to confirm with the community if this is a harmful bug.
> 
> ------------------------------------------
> Writer site
> 
> /tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/block/ioctl.c:573
>        553      case BLKPBSZGET: /* get block device physical block size */
>        554          return put_uint(arg, bdev_physical_block_size(bdev));
>        555      case BLKIOMIN:
>        556          return put_uint(arg, bdev_io_min(bdev));
>        557      case BLKIOOPT:
>        558          return put_uint(arg, bdev_io_opt(bdev));
>        559      case BLKALIGNOFF:
>        560          return put_int(arg, bdev_alignment_offset(bdev));
>        561      case BLKDISCARDZEROES:
>        562          return put_uint(arg, 0);
>        563      case BLKSECTGET:
>        564          max_sectors = min_t(unsigned int, USHRT_MAX,
>        565                      queue_max_sectors(bdev_get_queue(bdev)));
>        566          return put_ushort(arg, max_sectors);
>        567      case BLKROTATIONAL:
>        568          return put_ushort(arg, !blk_queue_nonrot(bdev_get_queue(bdev)));
>        569      case BLKRASET:
>        570      case BLKFRASET:
>        571          if(!capable(CAP_SYS_ADMIN))
>        572              return -EACCES;
> ==>    573          bdev->bd_bdi->ra_pages = (arg * 512) / PAGE_SIZE;
>        574          return 0;
>        575      case BLKBSZSET:
>        576          return blkdev_bszset(bdev, mode, argp);
>        577      case BLKPG:
>        578          return blkpg_ioctl(bdev, argp);
>        579      case BLKRRPART:
>        580          return blkdev_reread_part(bdev);
>        581      case BLKGETSIZE:
>        582          size = i_size_read(bdev->bd_inode);
>        583          if ((size >> 9) > ~0UL)
>        584              return -EFBIG;
>        585          return put_ulong(arg, size >> 9);
>        586      case BLKGETSIZE64:
>        587          return put_u64(arg, i_size_read(bdev->bd_inode));
>        588      case BLKTRACESTART:
>        589      case BLKTRACESTOP:
>        590      case BLKTRACESETUP:
>        591      case BLKTRACETEARDOWN:
>        592          return blk_trace_ioctl(bdev, cmd, argp);
>        593      case IOC_PR_REGISTER:
> 
> ------------------------------------------
> Reader site
> /tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/mm/fadvise.c:79
>         66      /*
>         67       * Careful about overflows. Len == 0 means "as much as possible".  Use
>         68       * unsigned math because signed overflows are undefined and UBSan
>         69       * complains.
>         70       */
>         71      endbyte = (u64)offset + (u64)len;
>         72      if (!len || endbyte < len)
>         73          endbyte = -1;
>         74      else
>         75          endbyte--;      /* inclusive */
>         76
>         77      switch (advice) {
>         78      case POSIX_FADV_NORMAL:
> ==>     79          file->f_ra.ra_pages = bdi->ra_pages;
>         80          spin_lock(&file->f_lock);
>         81          file->f_mode &= ~FMODE_RANDOM;
>         82          spin_unlock(&file->f_lock);
>         83          break;
>         84      case POSIX_FADV_RANDOM:
>         85          spin_lock(&file->f_lock);
>         86          file->f_mode |= FMODE_RANDOM;
>         87          spin_unlock(&file->f_lock);
>         88          break;
>         89      case POSIX_FADV_SEQUENTIAL:
>         90          file->f_ra.ra_pages = bdi->ra_pages * 2;
>         91          spin_lock(&file->f_lock);
>         92          file->f_mode &= ~FMODE_RANDOM;
>         93          spin_unlock(&file->f_lock);
>         94          break;
>         95      case POSIX_FADV_WILLNEED:
>         96          /* First and last PARTIAL page! */
>         97          start_index = offset >> PAGE_SHIFT;
>         98          end_index = endbyte >> PAGE_SHIFT;
> 
> 
> 
> ------------------------------------------
> Writer calling trace
> 
> - ksys_ioctl
> -- do_vfs_ioctl
> --- vfs_ioctl
> ---- blkdev_ioctl
> 
> ------------------------------------------
> Reader calling trace
> - ksys_fadvise64_64
> -- vfs_fadvise
> --- generic_fadvise
> 
> 
> 
> Thanks,
> Sishuai
> 





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux