Prior to commit 8a96a0e40810 ("block: rewrite blk_bvec_map_sg to avoid a nth_page call"), it was valid to call blk_bvec_map_sg() with an offset equal to or larger than the page size. blk_bvec_map_sg() would then adjust page and offset by calling bvec_nth_page(). This is no longer possible. Large offsets are now directly passed without validation to sg_set_page(). This results in data corruption and crashes such as the following, observed when trying to boot arm:vexpress from mmc. Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b CPU: 0 PID: 1 Comm: init Tainted: G W 5.1.0-rc5-next-20190418-dirty #1 Hardware name: ARM-Versatile Express [<c0313188>] (unwind_backtrace) from [<c030d920>] (show_stack+0x10/0x14) [<c030d920>] (show_stack) from [<c10daad8>] (dump_stack+0xb4/0xc8) [<c10daad8>] (dump_stack) from [<c03485fc>] (panic+0x110/0x328) [<c03485fc>] (panic) from [<c034d550>] (do_exit+0xbfc/0xc08) [<c034d550>] (do_exit) from [<c034e608>] (do_group_exit+0x3c/0xbc) [<c034e608>] (do_group_exit) from [<c035ace8>] (get_signal+0x13c/0x9c4) [<c035ace8>] (get_signal) from [<c030cd3c>] (do_work_pending+0x1a8/0x5e0) [<c030cd3c>] (do_work_pending) from [<c030106c>] (slow_work_pending+0xc/0x20) The above does not help at all when trying to track down the problem. Let's at least add a warning to generate a traceback. The crash will still be observed, but at least we'll have a hint what is causing it. With this patch applied, we'll see something like the following prior to the crash. WARNING: CPU: 0 PID: 84 at block/blk-merge.c:480 blk_rq_map_sg+0x3cc/0x6ac page offset 19456 larger than page size 4096 Modules linked in: CPU: 0 PID: 84 Comm: kworker/0:1H Not tainted 5.1.0-rc5-next-20190418-dirty #1 Hardware name: ARM-Versatile Express Workqueue: kblockd blk_mq_run_work_fn [<c0313188>] (unwind_backtrace) from [<c030d920>] (show_stack+0x10/0x14) [<c030d920>] (show_stack) from [<c10daad8>] (dump_stack+0xb4/0xc8) [<c10daad8>] (dump_stack) from [<c03483ac>] (__warn+0xe0/0xf8) [<c03483ac>] (__warn) from [<c0348410>] (warn_slowpath_fmt+0x4c/0x70) [<c0348410>] (warn_slowpath_fmt) from [<c078fb5c>] (blk_rq_map_sg+0x3cc/0x6ac) [<c078fb5c>] (blk_rq_map_sg) from [<c0eadeec>] (mmc_blk_data_prep+0x1b0/0x2c8) [<c0eadeec>] (mmc_blk_data_prep) from [<c0eae054>] (mmc_blk_rw_rq_prep+0x50/0x178) [<c0eae054>] (mmc_blk_rw_rq_prep) from [<c0eb145c>] (mmc_blk_mq_issue_rq+0x294/0x880) [<c0eb145c>] (mmc_blk_mq_issue_rq) from [<c0eb1e78>] (mmc_mq_queue_rq+0x128/0x230) [<c0eb1e78>] (mmc_mq_queue_rq) from [<c07957f8>] (blk_mq_dispatch_rq_list+0x3b4/0x5e0) [<c07957f8>] (blk_mq_dispatch_rq_list) from [<c079ab3c>] (blk_mq_do_dispatch_sched+0x70/0x10c) [<c079ab3c>] (blk_mq_do_dispatch_sched) from [<c079b238>] (blk_mq_sched_dispatch_requests+0x11c/0x198) [<c079b238>] (blk_mq_sched_dispatch_requests) from [<c0793e20>] (__blk_mq_run_hw_queue+0xe0/0x170) [<c0793e20>] (__blk_mq_run_hw_queue) from [<c03665d8>] (process_one_work+0x284/0x6b4) [<c03665d8>] (process_one_work) from [<c0367914>] (worker_thread+0x44/0x528) [<c0367914>] (worker_thread) from [<c036cc6c>] (kthread+0x15c/0x164) [<c036cc6c>] (kthread) from [<c03010e8>] (ret_from_fork+0x14/0x2c) Exception stack(0xc58d5fb0 to 0xc58d5ff8) 5fa0: 00000000 00000000 00000000 00000000 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 ---[ end trace ef7077b559d20ddc ]--- This patch is needed even if the call from mmc_blk_data_prep() is fixed. We'll have to expect further problems from callers who are not aware about the API change. Those problems will be difficult to track down without extra information. Use WARN_ONCE to report the problem. If the problem is seen, it may be seen many times, and we don't want to flood the kernel log with repeated messages. Cc: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Guenter Roeck <linux@xxxxxxxxxxxx> --- I don't know if there are situations where offset >= PAGE_SIZE is acceptable. A quick comparison with current mainline did not report any instances where offset >= PAGE_SIZE is passed to sg_set_page(). I do see a number of warnings triggered by this patch in -next (for example with alpha, m68k, ppc64, and riscv64 emulations) which do not (or not immediately) result in a crash. If there is a different / better means to detect a problem, please let me know and I'll be happy to adjust the patch accordingly. block/blk-merge.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/block/blk-merge.c b/block/blk-merge.c index 247b17f2a0f6..9432cd88bd7e 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -475,6 +475,10 @@ static unsigned blk_bvec_map_sg(struct request_queue *q, unsigned offset = bvec->bv_offset + total; unsigned len = min(get_max_segment_size(q, offset), nbytes); + WARN_ONCE(offset >= PAGE_SIZE, + "page offset %u larger than or equal to page size %ld\n", + offset, PAGE_SIZE); + *sg = blk_next_sg(sg, sglist); sg_set_page(*sg, bvec->bv_page, len, offset); -- 2.7.4