On Mon, Jun 14, 2021 at 01:37:06PM +0200, Daniel Wagner wrote: > On Tue, Jun 08, 2021 at 08:33:39PM +0200, Daniel Wagner wrote: > > cpumask_first_and() returns >= nr_cpu_ids if the two provided masks do > > not share a common bit. Verify we get a valid value back from > > cpumask_first_and(). > > So I got feedback on this issue (but not on the patch itself yet). The > system starts with 16 virtual CPU cores and during the test 4 cores are > removed[1] and as soon there is an error on the storage side, the reset > code on the host ends up in this path and crashes. I still don't > understand why the CPU removal is not updating the CPU mask correctly > before we hit the reset path. I'll continue to investigate. We don't update hctx->cpumask when CPU is added/removed, and that is assigned against cpu_possible_mask from beginning. It is one long-term issue, which can be triggered when all cpus in hctx->cpumask become offline. The thing is that only nvmf_connect_io_queue() allocates request via specified hctx. thanks, Ming