On 11/12/19 9:46 AM, Ming Lei wrote: > On Tue, Nov 12, 2019 at 07:19:58AM +0000, Junichi Nomura wrote: >> __bio_try_merge_page() may merge a page to bio without bio_full() check >> and cause bi_size overflow. >> >> The overflow typically ends up with sd_init_command() warning on zero >> segment request with call trace like this: >> >> ------------[ cut here ]------------ >> WARNING: CPU: 2 PID: 1986 at drivers/scsi/scsi_lib.c:1025 scsi_init_io+0x156/0x180 >> CPU: 2 PID: 1986 Comm: kworker/2:1H Kdump: loaded Not tainted 5.4.0-rc7 #1 >> Workqueue: kblockd blk_mq_run_work_fn >> RIP: 0010:scsi_init_io+0x156/0x180 >> RSP: 0018:ffffa11487663bf0 EFLAGS: 00010246 >> RAX: 00000000002be0a0 RBX: ffff8e6e9ff30118 RCX: 0000000000000000 >> RDX: 00000000ffffffe1 RSI: 0000000000000000 RDI: ffff8e6e9ff30118 >> RBP: ffffa11487663c18 R08: ffffa11487663d28 R09: ffff8e6e9ff30150 >> R10: 0000000000000001 R11: 0000000000000000 R12: ffff8e6e9ff30000 >> R13: 0000000000000001 R14: ffff8e74a1cf1800 R15: ffff8e6e9ff30000 >> FS: 0000000000000000(0000) GS:ffff8e6ea7680000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 00007fff18cf0fe8 CR3: 0000000659f0a001 CR4: 00000000001606e0 >> Call Trace: >> sd_init_command+0x326/0xb40 [sd_mod] >> scsi_queue_rq+0x502/0xaa0 >> ? blk_mq_get_driver_tag+0xe7/0x120 >> blk_mq_dispatch_rq_list+0x256/0x5a0 >> ? elv_rb_del+0x24/0x30 >> ? deadline_remove_request+0x7b/0xc0 >> blk_mq_do_dispatch_sched+0xa3/0x140 >> blk_mq_sched_dispatch_requests+0xfb/0x170 >> __blk_mq_run_hw_queue+0x81/0x130 >> blk_mq_run_work_fn+0x1b/0x20 >> process_one_work+0x179/0x390 >> worker_thread+0x4f/0x3e0 >> kthread+0x105/0x140 >> ? max_active_store+0x80/0x80 >> ? kthread_bind+0x20/0x20 >> ret_from_fork+0x35/0x40 >> ---[ end trace f9036abf5af4a4d3 ]--- >> blk_update_request: I/O error, dev sdd, sector 2875552 op 0x1:(WRITE) flags 0x0 phys_seg 0 prio class 0 >> XFS (sdd1): writeback error on sector 2875552 >> >> __bio_try_merge_page() should check the overflow before actually doing >> merge. >> >> Fixes: 07173c3ec276c ("block: enable multipage bvecs") >> Signed-off-by: Jun'ichi Nomura <j-nomura@xxxxxxxxxxxxx> >> Cc: Ming Lei <ming.lei@xxxxxxxxxx> >> Cc: Jens Axboe <axboe@xxxxxxxxx> >> >> diff --git a/block/bio.c b/block/bio.c >> --- a/block/bio.c >> +++ b/block/bio.c >> @@ -751,7 +751,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page *page, >> if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED))) >> return false; >> >> - if (bio->bi_vcnt > 0) { >> + if (bio->bi_vcnt > 0 && !bio_full(bio, len)) { >> struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1]; >> >> if (page_is_mergeable(bv, page, len, off, same_page)) { >> > > Looks fine: > > Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx> > Oh f**k. That is the bug I've been hunting for years now. Thanks Junichi! Reviewed-by: Hannes Reinecke <hare@xxxxxxx> Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@xxxxxxx +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 247165 (AG München), GF: Felix Imendörffer