Re: [patch v4 4/5] scsi_transport_fc: Added a new rport state FC_PORTSTATE_MARGINAL

Mike Christie <michael.christie@xxxxxxxxxx> · Thu, 29 Oct 2020 11:20:09 -0500




On 10/29/20 6:53 AM, Muneendra Kumar M wrote:
Hi Mike,
Below are my replies.

-----Original Message-----
From: Mike Christie [mailto:michael.christie@xxxxxxxxxx]
Sent: Monday, October 26, 2020 10:45 PM
To: Muneendra <muneendra.kumar@xxxxxxxxxxxx>; linux-scsi@xxxxxxxxxxxxxxx;
hare@xxxxxxx
Cc: jsmart2021@xxxxxxxxx; emilne@xxxxxxxxxx; mkumar@xxxxxxxxxx
Subject: Re: [patch v4 4/5] scsi_transport_fc: Added a new rport state
FC_PORTSTATE_MARGINAL

On 10/22/20 7:34 AM, Muneendra wrote:
@@ -2071,6 +2074,7 @@ fc_eh_timed_out(struct scsi_cmnd *scmd)  {
  	struct fc_rport *rport =
starget_to_rport(scsi_target(scmd->device));

+	fc_rport_chkmarginal_set_noretries(rport, scmd);
  	if (rport->port_state == FC_PORTSTATE_BLOCKED)
  		return BLK_EH_RESET_TIMER;

If we are in port state marginal above, then we will try to abort the cmd,
but if while doing the abort we call fc_remote_port_delete and
fc_remote_port_add then the port state will be online when the EH callouts
complete. In >this case, the port state is online in the end, but we would
fail the command like it was in marginal.
[Muneendra] I have to  make sure the flag is set after the check for blocked
state.  If blocked, it's returning BLK_EH_RESET_TIMER, so it will restart
the eh
timer. The io will "sit out" like this, pending, until either the adapter
fails it back due to logout or io completion, or fastio fail or
rport devloss timesout and invokes the abort handler to force abort .

Hey,

I'm not sure if we are talking about the same thing. If port state is 
marginal above, then we set the NORETRIES bit then return BLK_EH_DONE 
which will start up the scsi eh_abort_handler and if that fails the rest 
of the scsi eh_*_handlers.

While we are calling the eh handlers, if the driver does a 
fc_remote_port_delete then fc_remote_port_add we still have the 
NORETRIES bit set, so when we return from the eh_*_handlers we will fail 
the IO upwards.

I was trying to ask if you wanted the IO failed upwards in that case. 
Because the port state went to online, did you want the normal (cleared 
NOTRIES bit) cmd retry behavior? It sounds like below you want the 
cleared NORETRIED bit behavior, right?



+		(rport->port_state != FC_PORTSTATE_MARGINAL)) {
  		spin_unlock_irqrestore(shost->host_lock, flags);
  		return;

It looks like if fc_remote_port_delete is called, then we will allow that
function to set the port_state to blocked. If the problem is resolved then
fc_remote_port_add will set the state to online. So it would look like the
port state is >now ok in the kernel, but would userspace still have it in
the marginal port group?

Did you want this behavior or did you want it to stay in marginal until
your daemon marks it as online?
[Muneendra] We need this behavior.User daemon
should not depend on the rport_state to move a path from marginal path
  group.It should only depends on RSCN and LINKUP events/manual
intervention. events that we look out (rscn for target-side cable  bounces
and link up/down for initiator cable bounces) will result in
port state changes - so although we don't drive one from the other, they are
correlated.

Regards,
Muneendra.