On Fri, Apr 19, 2019 at 12:59 AM Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > > Prior to commit 8a96a0e40810 ("block: rewrite blk_bvec_map_sg to avoid a > nth_page call"), it was valid to call blk_bvec_map_sg() with an offset > equal to or larger than the page size. blk_bvec_map_sg() would then adjust > page and offset by calling bvec_nth_page(). This is no longer possible. > Large offsets are now directly passed without validation to sg_set_page(). > This results in data corruption and crashes such as the following, > observed when trying to boot arm:vexpress from mmc. The issue is in scatterlist code, which claims(and is supposed) to support >=PAGE_SIZE offset and start page. However, there is bug somewhere. > > Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b > CPU: 0 PID: 1 Comm: init Tainted: G W > 5.1.0-rc5-next-20190418-dirty #1 > Hardware name: ARM-Versatile Express > [<c0313188>] (unwind_backtrace) from [<c030d920>] (show_stack+0x10/0x14) > [<c030d920>] (show_stack) from [<c10daad8>] (dump_stack+0xb4/0xc8) > [<c10daad8>] (dump_stack) from [<c03485fc>] (panic+0x110/0x328) > [<c03485fc>] (panic) from [<c034d550>] (do_exit+0xbfc/0xc08) > [<c034d550>] (do_exit) from [<c034e608>] (do_group_exit+0x3c/0xbc) > [<c034e608>] (do_group_exit) from [<c035ace8>] (get_signal+0x13c/0x9c4) > [<c035ace8>] (get_signal) from [<c030cd3c>] (do_work_pending+0x1a8/0x5e0) > [<c030cd3c>] (do_work_pending) from [<c030106c>] (slow_work_pending+0xc/0x20) > > The above does not help at all when trying to track down the problem. > Let's at least add a warning to generate a traceback. The crash will > still be observed, but at least we'll have a hint what is causing it. > With this patch applied, we'll see something like the following > prior to the crash. > > WARNING: CPU: 0 PID: 84 at block/blk-merge.c:480 blk_rq_map_sg+0x3cc/0x6ac > page offset 19456 larger than page size 4096 > Modules linked in: > CPU: 0 PID: 84 Comm: kworker/0:1H Not tainted 5.1.0-rc5-next-20190418-dirty #1 > Hardware name: ARM-Versatile Express > Workqueue: kblockd blk_mq_run_work_fn > [<c0313188>] (unwind_backtrace) from [<c030d920>] (show_stack+0x10/0x14) > [<c030d920>] (show_stack) from [<c10daad8>] (dump_stack+0xb4/0xc8) > [<c10daad8>] (dump_stack) from [<c03483ac>] (__warn+0xe0/0xf8) > [<c03483ac>] (__warn) from [<c0348410>] (warn_slowpath_fmt+0x4c/0x70) > [<c0348410>] (warn_slowpath_fmt) from [<c078fb5c>] (blk_rq_map_sg+0x3cc/0x6ac) > [<c078fb5c>] (blk_rq_map_sg) from [<c0eadeec>] (mmc_blk_data_prep+0x1b0/0x2c8) > [<c0eadeec>] (mmc_blk_data_prep) from [<c0eae054>] (mmc_blk_rw_rq_prep+0x50/0x178) > [<c0eae054>] (mmc_blk_rw_rq_prep) from [<c0eb145c>] (mmc_blk_mq_issue_rq+0x294/0x880) > [<c0eb145c>] (mmc_blk_mq_issue_rq) from [<c0eb1e78>] (mmc_mq_queue_rq+0x128/0x230) > [<c0eb1e78>] (mmc_mq_queue_rq) from [<c07957f8>] (blk_mq_dispatch_rq_list+0x3b4/0x5e0) > [<c07957f8>] (blk_mq_dispatch_rq_list) from [<c079ab3c>] (blk_mq_do_dispatch_sched+0x70/0x10c) > [<c079ab3c>] (blk_mq_do_dispatch_sched) from [<c079b238>] (blk_mq_sched_dispatch_requests+0x11c/0x198) > [<c079b238>] (blk_mq_sched_dispatch_requests) from [<c0793e20>] (__blk_mq_run_hw_queue+0xe0/0x170) > [<c0793e20>] (__blk_mq_run_hw_queue) from [<c03665d8>] (process_one_work+0x284/0x6b4) > [<c03665d8>] (process_one_work) from [<c0367914>] (worker_thread+0x44/0x528) > [<c0367914>] (worker_thread) from [<c036cc6c>] (kthread+0x15c/0x164) > [<c036cc6c>] (kthread) from [<c03010e8>] (ret_from_fork+0x14/0x2c) > Exception stack(0xc58d5fb0 to 0xc58d5ff8) > 5fa0: 00000000 00000000 00000000 00000000 > 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > ---[ end trace ef7077b559d20ddc ]--- > > This patch is needed even if the call from mmc_blk_data_prep() is fixed. > We'll have to expect further problems from callers who are not aware about > the API change. Those problems will be difficult to track down without > extra information. > > Use WARN_ONCE to report the problem. If the problem is seen, it may be > seen many times, and we don't want to flood the kernel log with repeated > messages. > > Cc: Christoph Hellwig <hch@xxxxxx> > Signed-off-by: Guenter Roeck <linux@xxxxxxxxxxxx> > --- > I don't know if there are situations where offset >= PAGE_SIZE > is acceptable. A quick comparison with current mainline did not report > any instances where offset >= PAGE_SIZE is passed to sg_set_page(). > I do see a number of warnings triggered by this patch in -next > (for example with alpha, m68k, ppc64, and riscv64 emulations) which > do not (or not immediately) result in a crash. > > If there is a different / better means to detect a problem, please > let me know and I'll be happy to adjust the patch accordingly. > > block/blk-merge.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 247b17f2a0f6..9432cd88bd7e 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -475,6 +475,10 @@ static unsigned blk_bvec_map_sg(struct request_queue *q, > unsigned offset = bvec->bv_offset + total; > unsigned len = min(get_max_segment_size(q, offset), nbytes); > > + WARN_ONCE(offset >= PAGE_SIZE, > + "page offset %u larger than or equal to page size %ld\n", > + offset, PAGE_SIZE); > + > *sg = blk_next_sg(sg, sglist); > sg_set_page(*sg, bvec->bv_page, len, offset); No, it shouldn't be difficult to trigger >= PAGE_SIZE offset, and it is definitely correct. Thanks, Ming Lei