FYI, each blktests test case can define DMESG_FILTER not to fail with specific
keywords in dmesg. Test cases meta/011 and block/028 are reference use
cases.
Ah okay, let me look into it.
So I made the state read function a bit more robust (test if state file
exists) and the it turns out this made rdma happy(??) but tcp is still
breaking.
s/tcp/fc/
On closer inspection I see following sequence for fc:
[399664.863585] nvmet: connect request for invalid subsystem blktests-subsystem-1!
[399664.863704] nvme nvme0: Connect Invalid Data Parameter, subsysnqn "blktests-subsystem-1"
[399664.863758] nvme nvme0: NVME-FC{0}: reset: Reconnect attempt failed (16770)
[399664.863784] nvme nvme0: NVME-FC{0}: reconnect failure
[399664.863837] nvme nvme0: Removing ctrl: NQN "blktests-subsystem-1"
When the host tries to reconnect to a non existing controller (the test
called _remove_nvmet_subsystem_from_port()) the target returns 0x4182
(NVME_SC_DNR|NVME_SC_READ_ONLY(?)).
That is not something that the target is supposed to be doing, I have no
idea why this is sent. Perhaps this is something specific to the fc
implementation?
So arguably fc behaves correct by
stopping the reconnects. tcp and rdma just ignore the DNR.
DNR means do not retry the command, it says nothing about do not attempt
a future reconnect...
If we agree that the fc behavior is the right one, then the nvmet code
needs to be changed so that when the qid_max attribute changes it forces
a reconnect. The trick with calling _remove_nvmet_subsystem_from_port()
to force a reconnect is not working. And tcp/rdma needs to honor the
DNR.
tcp/rdma honor DNR afaik.