[PATCH -next] block: Warn if page offset is equal to or larger than page size

Guenter Roeck <linux@xxxxxxxxxxxx> · Thu, 18 Apr 2019 09:58:41 -0700

Prior to commit 8a96a0e40810 ("block: rewrite blk_bvec_map_sg to avoid a
nth_page call"), it was valid to call blk_bvec_map_sg() with an offset
equal to or larger than the page size. blk_bvec_map_sg() would then adjust
page and offset by calling bvec_nth_page(). This is no longer possible.
Large offsets are now directly passed without validation to sg_set_page().
This results in data corruption and crashes such as the following,
observed when trying to boot arm:vexpress from mmc.

Kernel panic - not syncing: Attempted to kill init!  exitcode=0x0000000b
CPU: 0 PID: 1 Comm: init Tainted: G        W
5.1.0-rc5-next-20190418-dirty #1
Hardware name: ARM-Versatile Express
[<c0313188>] (unwind_backtrace) from [<c030d920>] (show_stack+0x10/0x14)
[<c030d920>] (show_stack) from [<c10daad8>] (dump_stack+0xb4/0xc8)
[<c10daad8>] (dump_stack) from [<c03485fc>] (panic+0x110/0x328)
[<c03485fc>] (panic) from [<c034d550>] (do_exit+0xbfc/0xc08)
[<c034d550>] (do_exit) from [<c034e608>] (do_group_exit+0x3c/0xbc)
[<c034e608>] (do_group_exit) from [<c035ace8>] (get_signal+0x13c/0x9c4)
[<c035ace8>] (get_signal) from [<c030cd3c>] (do_work_pending+0x1a8/0x5e0)
[<c030cd3c>] (do_work_pending) from [<c030106c>] (slow_work_pending+0xc/0x20)

The above does not help at all when trying to track down the problem.
Let's at least add a warning to generate a traceback. The crash will
still be observed, but at least we'll have a hint what is causing it.
With this patch applied, we'll see something like the following
prior to the crash.

WARNING: CPU: 0 PID: 84 at block/blk-merge.c:480 blk_rq_map_sg+0x3cc/0x6ac
page offset 19456 larger than page size 4096
Modules linked in:
CPU: 0 PID: 84 Comm: kworker/0:1H Not tainted 5.1.0-rc5-next-20190418-dirty #1
Hardware name: ARM-Versatile Express
Workqueue: kblockd blk_mq_run_work_fn
[<c0313188>] (unwind_backtrace) from [<c030d920>] (show_stack+0x10/0x14)
[<c030d920>] (show_stack) from [<c10daad8>] (dump_stack+0xb4/0xc8)
[<c10daad8>] (dump_stack) from [<c03483ac>] (__warn+0xe0/0xf8)
[<c03483ac>] (__warn) from [<c0348410>] (warn_slowpath_fmt+0x4c/0x70)
[<c0348410>] (warn_slowpath_fmt) from [<c078fb5c>] (blk_rq_map_sg+0x3cc/0x6ac)
[<c078fb5c>] (blk_rq_map_sg) from [<c0eadeec>] (mmc_blk_data_prep+0x1b0/0x2c8)
[<c0eadeec>] (mmc_blk_data_prep) from [<c0eae054>] (mmc_blk_rw_rq_prep+0x50/0x178)
[<c0eae054>] (mmc_blk_rw_rq_prep) from [<c0eb145c>] (mmc_blk_mq_issue_rq+0x294/0x880)
[<c0eb145c>] (mmc_blk_mq_issue_rq) from [<c0eb1e78>] (mmc_mq_queue_rq+0x128/0x230)
[<c0eb1e78>] (mmc_mq_queue_rq) from [<c07957f8>] (blk_mq_dispatch_rq_list+0x3b4/0x5e0)
[<c07957f8>] (blk_mq_dispatch_rq_list) from [<c079ab3c>] (blk_mq_do_dispatch_sched+0x70/0x10c)
[<c079ab3c>] (blk_mq_do_dispatch_sched) from [<c079b238>] (blk_mq_sched_dispatch_requests+0x11c/0x198)
[<c079b238>] (blk_mq_sched_dispatch_requests) from [<c0793e20>] (__blk_mq_run_hw_queue+0xe0/0x170)
[<c0793e20>] (__blk_mq_run_hw_queue) from [<c03665d8>] (process_one_work+0x284/0x6b4)
[<c03665d8>] (process_one_work) from [<c0367914>] (worker_thread+0x44/0x528)
[<c0367914>] (worker_thread) from [<c036cc6c>] (kthread+0x15c/0x164)
[<c036cc6c>] (kthread) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
Exception stack(0xc58d5fb0 to 0xc58d5ff8)
5fa0:                                     00000000 00000000 00000000 00000000
5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
---[ end trace ef7077b559d20ddc ]---

This patch is needed even if the call from mmc_blk_data_prep() is fixed.
We'll have to expect further problems from callers who are not aware about
the API change. Those problems will be difficult to track down without
extra information.

Use WARN_ONCE to report the problem. If the problem is seen, it may be
seen many times, and we don't want to flood the kernel log with repeated
messages.

Cc: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Guenter Roeck <linux@xxxxxxxxxxxx>
---
I don't know if there are situations where offset >= PAGE_SIZE
is acceptable. A quick comparison with current mainline did not report
any instances where offset >= PAGE_SIZE is passed to sg_set_page().
I do see a number of warnings triggered by this patch in -next
(for example with alpha, m68k, ppc64, and riscv64 emulations) which
do not (or not immediately) result in a crash.

If there is a different / better means to detect a problem, please
let me know and I'll be happy to adjust the patch accordingly.

 block/blk-merge.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 247b17f2a0f6..9432cd88bd7e 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -475,6 +475,10 @@ static unsigned blk_bvec_map_sg(struct request_queue *q,
 		unsigned offset = bvec->bv_offset + total;
 		unsigned len = min(get_max_segment_size(q, offset), nbytes);
 
+		WARN_ONCE(offset >= PAGE_SIZE,
+			  "page offset %u larger than or equal to page size %ld\n",
+			  offset, PAGE_SIZE);
+
 		*sg = blk_next_sg(sg, sglist);
 		sg_set_page(*sg, bvec->bv_page, len, offset);
 
-- 
2.7.4