Re: [PATCH] blk-mq: Do not lookup ctx with invalid index

Ming Lei <ming.lei@xxxxxxxxxx> · Tue, 15 Jun 2021 11:37:39 +0800

On Mon, Jun 14, 2021 at 01:37:06PM +0200, Daniel Wagner wrote:
> On Tue, Jun 08, 2021 at 08:33:39PM +0200, Daniel Wagner wrote:
> > cpumask_first_and() returns >= nr_cpu_ids if the two provided masks do
> > not share a common bit. Verify we get a valid value back from
> > cpumask_first_and().
> 
> So I got feedback on this issue (but not on the patch itself yet). The
> system starts with 16 virtual CPU cores and during the test 4 cores are
> removed[1] and as soon there is an error on the storage side, the reset
> code on the host ends up in this path and crashes. I still don't
> understand why the CPU removal is not updating the CPU mask correctly
> before we hit the reset path. I'll continue to investigate.

We don't update hctx->cpumask when CPU is added/removed, and that is
assigned against cpu_possible_mask from beginning.

It is one long-term issue, which can be triggered when all cpus in
hctx->cpumask become offline. The thing is that only nvmf_connect_io_queue()
allocates request via specified hctx.

thanks,
Ming