From: Ye Bin <yebin10@xxxxxxxxxx> There's a issue as follows when do format NVME with IO: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 PGD 101727f067 P4D 1011fae067 PUD fbed78067 PMD 0 Oops: 0000 [#1] SMP NOPTI RIP: 0010:kfree+0x4f/0x160 RSP: 0018:ff705a800912b910 EFLAGS: 00010247 RAX: 0000000000000000 RBX: 0d06d30000000000 RCX: ff4fb320260ad990 RDX: ff4fb30ee7acba40 RSI: 0000000000000000 RDI: 00b04cff80000000 RBP: ff4fb30ee7acba40 R08: 0000000000000200 R09: ff705a800912bb60 R10: 0000000000000000 R11: ff4fb3103b67c750 R12: ffffffff9a62d566 R13: ff4fb30aa0530000 R14: 0000000000000000 R15: 000000000000000a FS: 00007f4399b6b700(0000) GS:ff4fb31040140000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 0000001014cd4002 CR4: 0000000000761ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: bio_integrity_free+0xa6/0xb0 __bio_integrity_endio+0x8c/0xa0 bio_endio+0x2b/0x130 blk_update_request+0x78/0x2b0 blk_mq_end_request+0x1a/0x140 blk_mq_try_issue_directly+0x5d/0xc0 blk_mq_make_request+0x46b/0x540 generic_make_request+0x121/0x300 submit_bio+0x6c/0x140 __blkdev_direct_IO_simple+0x1ca/0x3a0 blkdev_direct_IO+0x3d9/0x460 generic_file_read_iter+0xb4/0xc60 new_sync_read+0x121/0x170 vfs_read+0x89/0x130 ksys_read+0x52/0xc0 do_syscall_64+0x5d/0x1d0 entry_SYSCALL_64_after_hwframe+0x65/0xca Assuming a 512 byte directIO is issued, the initial logical block size of the state block device is 512 bytes, and then modified to 4096 bytes. Above issue may happen as follows: Direct read format NVME __blkdev_direct_IO_simple(iocb, iter, nr_pages); if ((pos | iov_iter_alignment(iter)) & (bdev_logical_block_size(bdev) - 1)) -->The logical block size is 512, and the IO issued is 512 bytes, which can be checked return -EINVAL; submit_bio(&bio); nvme_dev_ioctl case NVME_IOCTL_RESCAN: nvme_queue_scan(ctrl); ... nvme_update_disk_info(disk, ns, id); blk_queue_logical_block_size(disk->queue, bs); --> 512->4096 blk_queue_enter(q, flags) blk_mq_make_request(q, bio) bio_integrity_prep(bio) len = bio_integrity_bytes(bi, bio_sectors(bio)); -->At this point, because the logical block size has increased to 4096 bytes, the calculated 'len' here is 0 buf = kmalloc(len, GFP_NOIO | q->bounce_gfp); -->Passed in len=0 and returned buf=16 end = (((unsigned long) buf) + len + PAGE_SIZE - 1) >> PAGE_SHIFT; start = ((unsigned long) buf) >> PAGE_SHIFT; nr_pages = end - start; -->nr_pages == 1 bip->bip_flags |= BIP_BLOCK_INTEGRITY; for (i = 0 ; i < nr_pages ; i++) { if (len <= 0) -->Not initializing the bip_vec of bio_integrity, will result in null pointer access during subsequent releases. Even if initialized, it will still cause subsequent releases access null pointer because the buffer address is incorrect. break; Firstly, it is unreasonable to format NVME in the presence of IO. It is also possible to see IO smaller than the logical block size in the block layer for this type of concurrency. It is expected that this type of IO device will return an error, so exception handling should also be done for this type of IO to prevent null pointer access from causing system crashes. The root cause of this issue is the concurrency between the write process and the block size update process. However, this concurrency does not exist in actual production environments. To solve above issue, Verify if the segments of BIO are aligned with integrity intervals. Signed-off-by: Ye Bin <yebin10@xxxxxxxxxx> --- block/bio-integrity.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/block/bio-integrity.c b/block/bio-integrity.c index 2e3e8e04961e..00a0d1bafe06 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -431,7 +431,7 @@ bool bio_integrity_prep(struct bio *bio) void *buf; unsigned long start, end; unsigned int len, nr_pages; - unsigned int bytes, offset, i; + unsigned int bytes, offset, i, intervals; if (!bi) return true; @@ -457,7 +457,13 @@ bool bio_integrity_prep(struct bio *bio) } /* Allocate kernel buffer for protection data */ - len = bio_integrity_bytes(bi, bio_sectors(bio)); + intervals = bio_integrity_intervals(bi, bio_sectors(bio)); + if (unlikely((bio->bi_vcnt && intervals < bio->bi_vcnt) || + (!bio->bi_vcnt && intervals < bio_segments(bio)))) { + printk(KERN_ERR"BIO segments are not aligned according to integrity interval\n"); + goto err_end_io; + } + len = intervals * bi->tuple_size; buf = kmalloc(len, GFP_NOIO); if (unlikely(buf == NULL)) { printk(KERN_ERR "could not allocate integrity buffer\n"); -- 2.31.1