Re: [PATCH] nvme: Revert: Fix controller creation races with teardown flow

Sagi Grimberg <sagi@xxxxxxxxxxx> · Tue, 1 Sep 2020 15:01:19 -0700

This is indeed a regression.

Perhaps we should also revert:
12a0b6622107 ("nvme: don't hold nvmf_transports_rwsem for more 
than transport lookups")

Which inherently caused this by removing the serialization of
.create_ctrl()...

no, I believe the patch on the semaphore is correct. Otherwise - 
things can be blocked a long time.. a minute (1 cmd timeout) or 
even multiple minutes in the case where a command failure in core 
layers effectively gets ignored and thus doesn't cause the error 
path in the transport. There can be multiple /dev/nvme-fabrics 
commands stacked up that can make the delays look much longer to 
the last guy.

as far as creation vs teardown... yeah, not fun, but there are 
other ways to deal with it. FC: I got rid of the separate 
create/reconnect threads a while ago thus the 
return-control-while-reconnecting behavior, so I've had to deal 
with it.  It's one area it'd be nice to see some convergence in 
implementation again between transports.

Doesn't fc have a bug there? in create_ctrl after flushing the
connect_work, what is telling it if delete is running in with it
(or that it already ran...)

I guess I don't understand what the bug is you are thinking about. 
Maybe there's a short period that the ctrl ptr is perhaps freed, thus 
the pointer shouldn't be used - but I don't see it as almost 
everything is simply looking at  the value of the pointer, not 
dereferencing it.

I'm referring to nvme_fc_init_ctrl, if delete happens while it
is waiting in flush_delayed_work(&ctrl->connect_work); won't you
dereference and return a controller that is possibly already
deleted/freed?

ok - that matches my "short period" and it is possible as there's one 
immediate printf that may dereference the ptr. After that, it's 
comparisons of the pointer value.  I can move the printf to avoid the 
issue.  That window's rather small.

But you also return back &ctrl->ctrl, that is another dereference, and
what will make ctrl to be an ERR_PTR?

Anyway, we should probably come up with something more robust...

I do have a bug or two  with delete association fighting with 
create_association - but it's mainly due to nvme_fc_error_recovery 
not the delete routine. I've reworked this area after seeing your 
other patches and will be posting after some more testing.  But no 
reason for synchronizing all ctrl creates.

Is it that big of an issue? it should fail rather quickly shouldn't it?

not sure what you are asking.   if it's how long to fail the creation of 
a new association - it's at least 60s (an admin command timeout).

That's the worst case (admin command timeout), but is it the most common
case?

Would making the timeouts shorter in the initial connect make sense?
Just throwing out ideas...