Re: [PATCH] nvme: Revert: Fix controller creation races with teardown flow

James Smart <james.smart@xxxxxxxxxxxx> · Fri, 28 Aug 2020 13:24:10 -0700

On 8/28/2020 12:08 PM, Sagi Grimberg wrote:

The indicated patch introduced a barrier in the sysfs_delete attribute
for the controller that rejects the request if the controller isn't
created. "Created" is defined as at least 1 call to nvme_start_ctrl().

This is problematic in error-injection testing.  If an error occurs on
the initial attempt to create an association and the controller enters
reconnect(s) attempts, the admin cannot delete the controller until
either there is a successful association created or ctrl_loss_tmo
times out.

Where this issue is particularly hurtful is when the "admin" is the
nvme-cli, it is performing a connection to a discovery controller, and
it is initiated via auto-connect scripts.  With the FC transport, if the
first connection attempt fails, the controller enters a normal reconnect
state but returns control to the cli thread that created the controller.
In this scenario, the cli attempts to read the discovery log via ioctl,
which fails, causing the cli to see it as an empty log and then proceeds
to delete the discovery controller. The delete is rejected and the
controller is left live. If the discovery controller reconnect then
succeeds, there is no action to delete it, and it sits live doing 
nothing.

This is indeed a regression.

Perhaps we should also revert:
12a0b6622107 ("nvme: don't hold nvmf_transports_rwsem for more than 
transport lookups")

Which inherently caused this by removing the serialization of
.create_ctrl()...

no, I believe the patch on the semaphore is correct. Otherwise - things 
can be blocked a long time.. a minute (1 cmd timeout) or even multiple 
minutes in the case where a command failure in core layers effectively 
gets ignored and thus doesn't cause the error path in the transport.  
There can be multiple /dev/nvme-fabrics commands stacked up that can 
make the delays look much longer to the last guy.

as far as creation vs teardown... yeah, not fun, but there are other 
ways to deal with it. FC: I got rid of the separate create/reconnect 
threads a while ago thus the return-control-while-reconnecting behavior, 
so I've had to deal with it.  It's one area it'd be nice to see some 
convergence in implementation again between transports.

-- james