Re: [PATCH blktests v3] nvme/046: test queue count changes on reconnect

Daniel Wagner <dwagner@xxxxxxx> · Fri, 16 Sep 2022 08:30:03 +0200

On Thu, Sep 15, 2022 at 05:41:46PM +0200, Hannes Reinecke wrote:
> > It looks like the reasoning here didn't take into consideration the
> > scenario we have here. I'd say we should not do it and handle it similar
> > as with have with tcp/rdma.
> > 
> But that is wrong.
> When we are evaluating the NVMe status we _need_ to check the DNR bit;
> that's precisely what it's for.

I asked what the desired behavior is and Sagi pointed out:

> DNR means do not retry the command, it says nothing about do not attempt
> a future reconnect...

I really don't care too much, it just that fc and tcp/rdma do not behave
the same in this regard which makes this test trip over. As long you two
can agree on a the 'correct' behavior, I can do the work.

> The real reason here is a queue inversion with fcloop; we've had them for
> ages, and I'm not surprised that it pops off now.

Could you elaborate?

> In fact, I was quite surprised that I didn't hit it when updating blktests;
> in previous versions I had to insert an explicit 'sleep 2' before disconnect
> to make it work.
> 
> I'd rather fix that than reverting FC to the (wrong) behaviour.

Sure, no problem with this approach.