On Mon, Nov 30, 2020 at 03:41:53PM +0000, Gong, Sishuai wrote: > We found a data race in linux kernel 5.3.11 that we are able to reproduce in x86 under specific interleavings. Currently, we are not sure about the consequence of this race so we would like to confirm with the community if this can be a harmful bug. How are you able to reproduce it? Normally mpage_readpage() is only called from a filesystem, and you shouldn't be able to change the size of the blocks in a block device while there's a mounted filesystem. > ------------------------------------------ > Writer site > > /tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/fs/block_dev.c:135 > 120 > 121 int set_blocksize(struct block_device *bdev, int size) > 122 { > 123 /* Size must be a power of two, and between 512 and PAGE_SIZE */ > 124 if (size > PAGE_SIZE || size < 512 || !is_power_of_2(size)) > 125 return -EINVAL; > 126 > 127 /* Size cannot be smaller than the size supported by the device */ > 128 if (size < bdev_logical_block_size(bdev)) > 129 return -EINVAL; > 130 > 131 /* Don't change the size if it is same as current */ > 132 if (bdev->bd_block_size != size) { > 133 sync_blockdev(bdev); > 134 bdev->bd_block_size = size; > ==> 135 bdev->bd_inode->i_blkbits = blksize_bits(size); > 136 kill_bdev(bdev); > 137 } > 138 return 0; > 139 } > > ------------------------------------------ > Reader site > > /tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/fs/mpage.c:160 > 147 /* > 148 * This is the worker routine which does all the work of mapping the disk > 149 * blocks and constructs largest possible bios, submits them for IO if the > 150 * blocks are not contiguous on the disk. > 151 * > 152 * We pass a buffer_head back and forth and use its buffer_mapped() flag to > 153 * represent the validity of its disk mapping and to decide when to do the next > 154 * get_block() call. > 155 */ > 156 static struct bio *do_mpage_readpage(struct mpage_readpage_args *args) > 157 { > 158 struct page *page = args->page; > 159 struct inode *inode = page->mapping->host; > ==> 160 const unsigned blkbits = inode->i_blkbits; > 161 const unsigned blocks_per_page = PAGE_SIZE >> blkbits; > 162 const unsigned blocksize = 1 << blkbits; > 163 struct buffer_head *map_bh = &args->map_bh; > 164 sector_t block_in_file; > 165 sector_t last_block; > 166 sector_t last_block_in_file; > 167 sector_t blocks[MAX_BUF_PER_PAGE]; > 168 unsigned page_block; > 169 unsigned first_hole = blocks_per_page; > 170 struct block_device *bdev = NULL; > 171 int length; > 172 int fully_mapped = 1; > 173 int op_flags; > 174 unsigned nblocks; > 175 unsigned relative_block; > 176 gfp_t gfp; > 177 > 178 if (args->is_readahead) { > 179 op_flags = REQ_RAHEAD; > 180 gfp = readahead_gfp_mask(page->mapping); > > > ------------------------------------------ > Writer calling trace > > - ksys_mount > -- do_mount > --- vfs_get_tree > ---- mount_bdev > ----- sb_min_blocksize > ------ sb_set_blocksize > ------- set_blocksize > > ------------------------------------------ > Reader calling trace > > - ksys_read > -- vfs_read > --- __vfs_read > ---- generic_file_read_iter > ----- page_cache_sync_readahead > ------ force_page_cache_readahead > ------- __do_page_cache_readahead > -------- read_pages > --------- mpage_readpages > ---------- do_mpage_readpage > > > > Thanks, > Sishuai >