On Fri, Jul 1, 2016 at 12:14 PM, Howard Cochran <hcochran@xxxxxxxxxxxxxxxx> wrote: > This crash occurred while writing 1 to /sys/block/sda/device/delete at > the same instant that another process was closing the block device: > > BUG: unable to handle kernel NULL pointer dereference at 00000230 > IP: [<c138fa9c>] blk_get_backing_dev_info+0xc/0x20 > Oops: 0000 [#1] PREEMPT SMP > Call Trace: > [<c112da2a>] ? __filemap_fdatawrite_range+0x15a/0x180 > [<c112d9b5>] ? __filemap_fdatawrite_range+0xe5/0x180 > [<c112dae8>] filemap_write_and_wait+0x38/0x70 > [<c11b79b1>] fsync_bdev+0x41/0x50 > [<c13a4f7c>] invalidate_partition+0x1c/0x40 > [<c13a5d0f>] del_gendisk+0xcf/0x1c0 > [<c15c7143>] sd_remove+0x53/0xb0 > [<c157eaf0>] __device_release_driver+0x80/0x120 > [<c157ebad>] device_release_driver+0x1d/0x30 > [<c157e392>] bus_remove_device+0xb2/0xf0 > [<c157b45c>] device_del+0xec/0x1e0 > [<c13b6d88>] ? kobject_put+0x58/0xc0 > [<c15c12af>] __scsi_remove_device+0xaf/0xc0 > [<c15c12df>] scsi_remove_device+0x1f/0x30 > [<c15c131b>] sdev_store_delete+0x2b/0x40 > [<c15c12f0>] ? scsi_remove_device+0x30/0x30 > [<c157a87f>] dev_attr_store+0x1f/0x40 > ... > [<c11829bc>] SyS_write+0x4c/0xb0 > EIP: [<c138fa9c>] blk_get_backing_dev_info+0xc/0x20 SS:ESP 0068:f5eb9d18 > > It is caused by this race: Between the time Thread B's instance of > filemap_write_and_wait() has asked whether there are any pages to flush and > when it it dereferences bdev->disk, Thread A can clear that pointer in > __blkdev_put(). > > Thread A: Thread B: > blkdev_close() sdev_store_delete() > blkdev_put() sd_remove() > __blkdev_put() del_gendisk() > mutex_lock(bd_mutex); invalidate_partition() > sync_blkdev() fsync_bdev() > filemap_write_and_wait() filemap_write_and_wait() > if (mapping has pages) if (mapping has pages) > deref bdev->disk (OK) > Set bdev->bd_disk = NULL; > mutex_unlock(bd_mutex); deref. bdev->bd_disk (BOOM!) > > The "dereference bdev->disk" occurs on this sub-chain: > filemap_write_and_wait() > __filemap_fdatawrite_range() > mapping_cap_writeback_dirty() > inode_to_bdi() > bdev_get_queue() > return bdev->disk->queue; > > The problem was introduced by de1414a654e6 ("fs: export inode_to_bdi and use > it in favor of mapping->backing_dev_info"). Before that change, > mapping_cap_writeback_dirty() directly retrieved the backing_dev_info from > the mapping rather than looking it up through > mapping->host->inode_dev->bdev->bd_disk->queue. > > This was found while running a stress test on an ARM-based embedded system > which involved repeatedly shutting down many services simultaneously via > systemd isolate (thereby making it likely that "Thread B" was preempted for > awhile just before it dereferenced bdev->bd_disk). I subsequently reproduced > this on vanilla Linux 4.6 in QEMU/x86. > > This patch fixes the race by making sd_remove() hold bd_mutex during the > call to del_gendisk(). > > Fixes: de1414a654e6 ("fs: export inode_to_bdi and use it in favor of > mapping->backing_dev_info") > Signed-off-by: Howard Cochran <hcochran@xxxxxxxxxxxxxxxx> > Cc: Howard Cochran <cochran@xxxxxxxxxxx> > Cc: linux-scsi@xxxxxxxxxxxxxxx > Cc: Christoph Hellwig <hch@xxxxxx> > Cc: James Bottomley <JBottomley@xxxxxxxx> > Cc: Martin K. Petersen <martin.petersen@xxxxxxxxxx> > --- > drivers/scsi/sd.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index f52b74c..0f53925 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -3126,6 +3126,7 @@ static int sd_remove(struct device *dev) > { > struct scsi_disk *sdkp; > dev_t devt; > + struct block_device *bdev; > > sdkp = dev_get_drvdata(dev); > devt = disk_devt(sdkp->disk); > @@ -3134,7 +3135,13 @@ static int sd_remove(struct device *dev) > async_synchronize_full_domain(&scsi_sd_pm_domain); > async_synchronize_full_domain(&scsi_sd_probe_domain); > device_del(&sdkp->dev); > + > + bdev = bdget_disk(sdkp->disk, 0); > + mutex_lock(&bdev->bd_mutex); > del_gendisk(sdkp->disk); > + mutex_unlock(&bdev->bd_mutex); > + bdput(bdev); > + > sd_shutdown(dev); > > blk_register_region(devt, SD_MINORS, NULL, > -- > 1.9.1 > Adding to Cc: Corrected email address for James Bottomley. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html