On Wed 01-03-17 15:29:09, Jan Kara wrote: > On Mon 27-02-17 18:27:55, Al Viro wrote: > > On Mon, Feb 27, 2017 at 06:11:11PM +0100, Dmitry Vyukov wrote: > > > Hello, > > > > > > The following program triggers GPF in bdi_put: > > > https://gist.githubusercontent.com/dvyukov/15b3e211f937ff6abc558724369066ce/raw/cc017edf57963e30175a6a6fe2b8d917f6e92899/gistfile1.txt > > > > What happens is > > * attempt of, essentially, mount -t bdev ..., calls mount_pseudo() > > and then promptly destroys the new instance it has created. > > * the only inode created on that sucker (root directory, that > > is) gets evicted. > > * most of ->evict_inode() is harmless, until it gets to > > if (bdev->bd_bdi != &noop_backing_dev_info) > > bdi_put(bdev->bd_bdi); > > Thanks for the analysis! > > > added there by "block: Make blk_get_backing_dev_info() safe without open bdev". > > Since ->bd_bdi hadn't been initialized for that sucker (the same patch has > > placed initialization into bdget()), we step into shit of varying nastiness, > > depending on phase of moon, etc. > > Yup, I've missed that the root inode of bdev superblock does not go through > bdget() (in fact I didn't think what happens with root inode for bdev > superblock at all) and thus bd_bdi is left uninitialized in that case. I'll > send a fix for that in a while. > > > Could somebody explain WTF do we have those two lines in bdev_evict_inode(), > > anyway? We set ->bd_bdi to something other than noop_backing_dev_info only > > in __blkdev_get() when ->bd_openers goes from zero to positive, so why is > > the matching bdi_put() not in __blkdev_put()? Jan? > > The problem is writeback code (from flusher work or through sync(2) - > generally inode_to_bdi() users) can be looking at bdev inode independently > from it being open. So if they start looking while the bdev is open but the > dereference happens after it is closed and device removed, we oops. We have > seen oopses due to this for quite a while. And all the stuff that is done > in __blkdev_put() is not enough to prevent writeback code from having a > look whether there is not something to write. > > So what we do now is that once we establish valid bd_bdi reference, we > leave it alone until bdev inode gets evicted. And to handle the case when > underlying device actually changes, we unhash bdev inode when the device > gets removed from the system so that it cannot be found by bdget() anymore. Attached patch fixes the problem for me. I'll post it officially tomorrow once Al has a chance to reply... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR
>From a533c8dd1fb4dbf840cd3adaf68afb6ad6851ddc Mon Sep 17 00:00:00 2001 From: Jan Kara <jack@xxxxxxx> Date: Wed, 1 Mar 2017 15:31:11 +0100 Subject: [PATCH] block: Initialize bd_bdi on inode initialization So far we initialized bd_bdi only in bdget(). That is fine for normal bdev inodes however for the special case of the root inode of blockdev_superblock that function is never called and thus bd_bdi is left uninitialized. As a result bdev_evict_inode() may oops doing bdi_put(root->bd_bdi) on that inode as can be seen when doing: mount -t bdev none /mnt Fix the problem by initializing bd_bdi when first allocating the inode and then reinitializing bd_bdi in bdev_evict_inode(). Thanks to syzkaller team for finding the problem. Reported-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx> Fixes: b1d2dc5659b41741f5a29b2ade76ffb4e5bb13d8 Signed-off-by: Jan Kara <jack@xxxxxxx> --- fs/block_dev.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 77c30f15a02c..2eca00ec4370 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -870,6 +870,7 @@ static void init_once(void *foo) #ifdef CONFIG_SYSFS INIT_LIST_HEAD(&bdev->bd_holder_disks); #endif + bdev->bd_bdi = &noop_backing_dev_info; inode_init_once(&ei->vfs_inode); /* Initialize mutex for freeze. */ mutex_init(&bdev->bd_fsfreeze_mutex); @@ -884,8 +885,10 @@ static void bdev_evict_inode(struct inode *inode) spin_lock(&bdev_lock); list_del_init(&bdev->bd_list); spin_unlock(&bdev_lock); - if (bdev->bd_bdi != &noop_backing_dev_info) + if (bdev->bd_bdi != &noop_backing_dev_info) { bdi_put(bdev->bd_bdi); + bdev->bd_bdi = &noop_backing_dev_info; + } } static const struct super_operations bdev_sops = { @@ -988,7 +991,6 @@ struct block_device *bdget(dev_t dev) bdev->bd_contains = NULL; bdev->bd_super = NULL; bdev->bd_inode = inode; - bdev->bd_bdi = &noop_backing_dev_info; bdev->bd_block_size = i_blocksize(inode); bdev->bd_part_count = 0; bdev->bd_invalidated = 0; -- 2.10.2