Re: [PATCH 01/15] zfcp: do not wait for rports to become unblocked after host reset

Benjamin Block <bblock@xxxxxxxxxxxxx> · Thu, 12 Oct 2023 15:54:52 +0200

Hey Hannes,

I've got a few questions re the rational for this change.

On Mon, Oct 02, 2023 at 05:49:13PM +0200, Hannes Reinecke wrote:
> zfcp_scsi_eh_host_reset_handler() would call fc_block_rport() to
> wait for all rports to become unblocked after host reset.
> But after host reset it might happen that the port is gone, hence
> fc_block_rport() might fail due to a missing port.
> But that's a perfectly legal operation; on FC remote ports might
> come and go.
> In the same vein FC HBAs are able to deal with ports being temporarily
> blocked, so really there is not point in waiting for all ports
> to become unblocked during host reset.

But in scsi_transport_fc.c we have this documented:

    * fc_block_scsi_eh - Block SCSI eh thread for blocked fc_rport
    * @cmnd: SCSI command that scsi_eh is trying to recover
    *
    * This routine can be called from a FC LLD scsi_eh callback. It
    * blocks the scsi_eh thread until the fc_rport leaves the
    * FC_PORTSTATE_BLOCKED, or the fast_io_fail_tmo fires. This is
    * necessary to avoid the scsi_eh failing recovery actions for blocked
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    * rports which would lead to offlined SCSI devices.
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

So I don't understand what the real expectation by the SCSI EH call back for
host reset is then.

Is it that all objects (host/target ports/luns) are operational again once we
return to the EH thread, or is it ok that some parts are still being
recovered (as with our host reset handler, rports might still be blocked after
`zfcp_erp_wait()` finishes, because of how this is organized internally).

If it's the later, I'd think this change is fine. But then I'd wonder why this
function exists in the first place? Is it because in other EH steps it's more
important that rports are ready after the step (e.g. because a TUR is send
after, and if that fails, things get escalate unnecessarily)?

Oh.. speaking of that, we do send a TUR after host reset as well
(`scsi_eh_test_devices()`). So doesn't this break then if one or more rports
are sill blocked after host reset returns? 
    At least `zfcp_scsi_queuecommand()` will bail very early if the rport is
not ready (we call `fc_remote_port_chkready()` as more or less first thing),
and so `scsi_send_eh_cmnd()` that is used for the TUR will fail; then it might
be retried one time, but this is a tight loop without any delay, so I'd guess
this has a good chance to fail as well.
    And then we'd offline the whole host as further escalation, which would
*REALLY* suck (with no automatic recovery no less).

My impression from look at the code that follows `scsi_try_host_reset()` in
`scsi_error.c` really is, it rather expects things to be ready to be used
after, right there and then (admittedly, this is probably already today
problematic, as things might go back to not working concurrently because of
some fabric event.. but anyway, we can life with that off-chance it seems).

Or do I miss something?

> Hence remove the call to fc_block_rport() in host reset.
> 
> Signed-off-by: Hannes Reinecke <hare@xxxxxxx>
> Cc: Steffen Maier <maier@xxxxxxxxxxxxx
> Cc: Benjamin Block <bblock@xxxxxxxxxxxxx>
> ---
>  drivers/s390/scsi/zfcp_scsi.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c
> index b2a8cd792266..14f929cca271 100644
> --- a/drivers/s390/scsi/zfcp_scsi.c
> +++ b/drivers/s390/scsi/zfcp_scsi.c
> @@ -383,10 +383,6 @@ static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt)
>  	}
>  	zfcp_erp_adapter_reopen(adapter, 0, "schrh_1");
>  	zfcp_erp_wait(adapter);
> -	fc_ret = fc_block_scsi_eh(scpnt);
> -	if (fc_ret)
> -		ret = fc_ret;
> -
>  	zfcp_dbf_scsi_eh("schrh_r", adapter, ~0, ret);
>  	return ret;
>  }
> -- 
> 2.35.3
> 

-- 
Best Regards, Benjamin Block        /        Linux on IBM Z Kernel Development
IBM Deutschland Research & Development GmbH    /   https://www.ibm.com/privacy
Vors. Aufs.-R.: Gregor Pillen         /         Geschäftsführung: David Faller
Sitz der Ges.: Böblingen     /    Registergericht: AmtsG Stuttgart, HRB 243294