This crash occurred while writing 1 to /sys/block/sda/device/delete at the same instant that another process was closing the block device: BUG: unable to handle kernel NULL pointer dereference at 00000230 IP: [<c138fa9c>] blk_get_backing_dev_info+0xc/0x20 Oops: 0000 [#1] PREEMPT SMP Call Trace: [<c112da2a>] ? __filemap_fdatawrite_range+0x15a/0x180 [<c112d9b5>] ? __filemap_fdatawrite_range+0xe5/0x180 [<c112dae8>] filemap_write_and_wait+0x38/0x70 [<c11b79b1>] fsync_bdev+0x41/0x50 [<c13a4f7c>] invalidate_partition+0x1c/0x40 [<c13a5d0f>] del_gendisk+0xcf/0x1c0 [<c15c7143>] sd_remove+0x53/0xb0 [<c157eaf0>] __device_release_driver+0x80/0x120 [<c157ebad>] device_release_driver+0x1d/0x30 [<c157e392>] bus_remove_device+0xb2/0xf0 [<c157b45c>] device_del+0xec/0x1e0 [<c13b6d88>] ? kobject_put+0x58/0xc0 [<c15c12af>] __scsi_remove_device+0xaf/0xc0 [<c15c12df>] scsi_remove_device+0x1f/0x30 [<c15c131b>] sdev_store_delete+0x2b/0x40 [<c15c12f0>] ? scsi_remove_device+0x30/0x30 [<c157a87f>] dev_attr_store+0x1f/0x40 ... [<c11829bc>] SyS_write+0x4c/0xb0 EIP: [<c138fa9c>] blk_get_backing_dev_info+0xc/0x20 SS:ESP 0068:f5eb9d18 It is caused by this race: Between the time Thread B's instance of filemap_write_and_wait() has asked whether there are any pages to flush and when it it dereferences bdev->disk, Thread A can clear that pointer in __blkdev_put(). Thread A: Thread B: blkdev_close() sdev_store_delete() blkdev_put() sd_remove() __blkdev_put() del_gendisk() mutex_lock(bd_mutex); invalidate_partition() sync_blkdev() fsync_bdev() filemap_write_and_wait() filemap_write_and_wait() if (mapping has pages) if (mapping has pages) deref bdev->disk (OK) Set bdev->bd_disk = NULL; mutex_unlock(bd_mutex); deref. bdev->bd_disk (BOOM!) The "dereference bdev->disk" occurs on this sub-chain: filemap_write_and_wait() __filemap_fdatawrite_range() mapping_cap_writeback_dirty() inode_to_bdi() bdev_get_queue() return bdev->disk->queue; The problem was introduced by de1414a654e6 ("fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info"). Before that change, mapping_cap_writeback_dirty() directly retrieved the backing_dev_info from the mapping rather than looking it up through mapping->host->inode_dev->bdev->bd_disk->queue. This was found while running a stress test on an ARM-based embedded system which involved repeatedly shutting down many services simultaneously via systemd isolate (thereby making it likely that "Thread B" was preempted for awhile just before it dereferenced bdev->bd_disk). I subsequently reproduced this on vanilla Linux 4.6 in QEMU/x86. This patch fixes the race by making sd_remove() hold bd_mutex during the call to del_gendisk(). Fixes: de1414a654e6 ("fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info") Signed-off-by: Howard Cochran <hcochran@xxxxxxxxxxxxxxxx> Cc: Howard Cochran <cochran@xxxxxxxxxxx> Cc: linux-scsi@xxxxxxxxxxxxxxx Cc: Christoph Hellwig <hch@xxxxxx> Cc: James Bottomley <JBottomley@xxxxxxxx> Cc: Martin K. Petersen <martin.petersen@xxxxxxxxxx> --- drivers/scsi/sd.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index f52b74c..0f53925 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3126,6 +3126,7 @@ static int sd_remove(struct device *dev) { struct scsi_disk *sdkp; dev_t devt; + struct block_device *bdev; sdkp = dev_get_drvdata(dev); devt = disk_devt(sdkp->disk); @@ -3134,7 +3135,13 @@ static int sd_remove(struct device *dev) async_synchronize_full_domain(&scsi_sd_pm_domain); async_synchronize_full_domain(&scsi_sd_probe_domain); device_del(&sdkp->dev); + + bdev = bdget_disk(sdkp->disk, 0); + mutex_lock(&bdev->bd_mutex); del_gendisk(sdkp->disk); + mutex_unlock(&bdev->bd_mutex); + bdput(bdev); + sd_shutdown(dev); blk_register_region(devt, SD_MINORS, NULL, -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html