(+Damien Le Moal) (more context at https://lore.kernel.org/linux-block/CAG_fn=VBHmBgqLi35tD27NRLH2tEZLH=Y+rTfZ3rKNz9ipG+jQ@xxxxxxxxxxxxxx/) On Mon, Nov 25, 2019 at 5:01 AM Chaitanya Kulkarni <Chaitanya.Kulkarni@xxxxxxx> wrote: > > On 11/22/2019 03:58 AM, Alexander Potapenko wrote: > > I'm using the following patch in KMSAN tree to suppress these bugs: > > > > diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c > > index 0e7da5015ccd5..9e8e99bdc0db3 100644 > > --- a/drivers/block/null_blk_main.c > > +++ b/drivers/block/null_blk_main.c > > @@ -1235,8 +1235,16 @@ static blk_status_t null_handle_cmd(struct > > nullb_cmd *cmd, sector_t sector, > > if (dev->memory_backed) > > cmd->error = null_handle_memory_backed(cmd, op); > > > > - if (!cmd->error && dev->zoned) > > + if (!cmd->error && dev->zoned) { > > cmd->error = null_handle_zoned(cmd, op, sector, nr_sectors); > > + } else if (dev->queue_mode != NULL_Q_BIO) { > > + /* > > + * TODO(glider): this is a hack to suppress bugs in nullb > > + * driver. I have no idea what I'm doing, better wait for a > > + * proper fix from Jens Axboe. > > + */ > > + cmd->error = null_handle_rq(cmd); > > + } > > This is just based on what I read in the code, I don't have > the kmsan support. > > null_handle_rq() should not be called without checking the > dev->memory_backed value since it handles memory backed operation, > (it may be working because of the correct error handling done > in the copy_from_nullb() when t_page == NULL). Thanks for the explanation! The code has changed recently, and my patch does not apply anymore, yet the problem still persists. I ended up just calling null_handle_rq() at the end of null_process_cmd(), but we probably need a cleaner fix. > Possible explanation could be for the above code fixing the problem > is:- > > null_handle_cmd() > None of the ifs are handled (since it is default behavior) > calls null_handle_rq() > Processes each bvec in this request :- > rq_for_each_segment() > calls null_transfer() > calls copy_from_null() > > Now here in copy_from_null(), null_lookup_page() returns > NULL since it is not present in the radix tree(dev and cache > since its !membacked && !cache_size I assume since I don't have > the command line of modprobe null_blk), > > and then does the dest memset():- > > (from linux-block for-next) > 1009 if (!t_page) { > 1010 memset(dst + off + count, 0, temp); > 1011 goto next; > 1012 } > > which initializes the destination page to 0. This is what Jens > is referring to I guess that we need to memset() it for the > read when !membacked. > > It will be great if you can verify the above code path on your > setup with above memset(). > > The above patch does fix the problem but it is very confusing. > > What we need is to traverse the rq bvec and actually initialize > the bvec.page and repeat the pattern present in the copy_from_null() > in such a way this common code is shared between membacked and > non-membacked mode for REQ_OP_READ. > > Jens can correct me if I'm wrong. > > I'll be happy to look into this if Jens is okay. > > > > > It appears to fix the problem, but I'm not really sure this is the > > right thing to do (e.g. whether it covers all invocations of > > null_handle_cmd(), probably not) > > I don't see an easy way to memset() the pages being read, as there're > > no pages at this point. > > > > > >> >-- > -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Halimah DeLaine Prado Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg