On Sun, Jan 8, 2017 at 12:50 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > On Sun, Jan 8, 2017 at 11:46 AM, Jan Kara <jack@xxxxxxx> wrote: >> On Fri 06-01-17 09:45:45, Dan Williams wrote: >>> On Fri, Jan 6, 2017 at 2:23 AM, Jan Kara <jack@xxxxxxx> wrote: >>> > On Thu 05-01-17 17:17:55, Dan Williams wrote: >>> >> The ->bd_queue member of struct block_device was added in commit >>> >> 87192a2a49c4 ("vfs: cache request_queue in struct block_device") in >>> >> v3.3. However, blk_get_backing_dev_info() has been using >>> >> bdev_get_queue() and grabbing the request_queue through the gendisk >>> >> since before the git era. >>> >> >>> >> At final __blkdev_put() time ->bd_disk is cleared while ->bd_queue is >>> >> not. The queue remains valid until the final put of the parent disk. >>> >> >>> >> The following crash signature results from blk_get_backing_dev_info() >>> >> trying to lookup the queue through ->bd_disk after the final put of the >>> >> block device. Simply switch bdev_get_queue() to use ->bd_queue directly >>> >> which is guaranteed to still be valid at invalidate_partition() time. >>> >> >>> >> BUG: unable to handle kernel NULL pointer dereference at 0000000000000568 >>> >> IP: blk_get_backing_dev_info+0x10/0x20 >>> >> [..] >>> >> Call Trace: >>> >> __inode_attach_wb+0x3a7/0x5d0 >>> >> __filemap_fdatawrite_range+0xf8/0x100 >>> >> filemap_write_and_wait+0x40/0x90 >>> >> fsync_bdev+0x54/0x60 >>> >> ? bdget_disk+0x30/0x40 >>> >> invalidate_partition+0x24/0x50 >>> >> del_gendisk+0xfa/0x230 >>> > >>> > So we have a similar reports of the same problem. E.g.: >>> > >>> > http://www.spinics.net/lists/linux-fsdevel/msg105153.html >>> > >>> > However I kind of miss how your patch would fix all those cases. The >>> > principial problem is that inode_to_bdi() called on block device inode >>> > wants to get the backing_dev_info however on last close of a block device >>> > we do put_disk() and thus the request queue containing backing_dev_info >>> > does not have to be around at that time. In your case you are lucky enough >>> > to have the containing disk still around but that's not the case for all >>> > inode_to_bdi() users (see e.g. the report I referenced) and your patch >>> > would change relatively straightforward NULL pointer dereference to rather >>> > subtle use-after-free issue >>> >>> True. If there are other cases that don't hold their own queue >>> reference this patch makes things worse. >>> >>> > so I disagree with going down this path. >>> >>> I still think this patch is the right thing to do, but it needs to >>> come after the wider guarantee that having an active bdev reference >>> guarantees that the queue and backing_dev_info are still allocated. >>> >>> > So what I think needs to be done is that we make backing_dev_info >>> > independently allocated structure with different lifetime rules to gendisk >>> > or request_queue - definitely I want it to live as long as block device >>> > inode exists. However it needs more thought what the exact lifetime rules >>> > will be. >>> >>> Hmm, why does it need to be separately allocated? >>> >>> Something like this, passes the libnvdimm unit tests: (non-whitespace >>> damaged version attached) >> >> So the problem with this approach is that request queue will be pinned while >> bdev inode exists. And how long that is is impossible to predict or influence >> from userspace so e.g. you cannot remove device driver from memory and even >> unplugging USB after it has been unmounted would suddently go via a path of >> "device removed while it is used" which can have unexpected consequences. I >> guess Jens or Christoph will know more about the details... > > We do have the "block, fs: reliably communicate bdev end-of-life" > effort that I need to revisit: > > http://www.spinics.net/lists/linux-fsdevel/msg93312.html > > ...but I don't immediately see how keeping the request_queue around > longer makes the situation worse? > Well the 0day kbuild robot and further testing with the libnvdimm unit tests shows that bdi code is not ready for this release timing change. So I retract it, sorry for the noise. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html