On 8/17/21 4:03 PM, Steffen Maier wrote: > On 8/17/21 2:54 PM, Hannes Reinecke wrote: >> On 8/17/21 1:53 PM, Benjamin Block wrote: >>> On Tue, Aug 17, 2021 at 11:14:13AM +0200, Hannes Reinecke wrote: >>>> @@ -383,9 +385,24 @@ static int >>>> zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt) >>>> } >>>> zfcp_erp_adapter_reopen(adapter, 0, "schrh_1"); >>>> zfcp_erp_wait(adapter); >>>> - fc_ret = fc_block_scsi_eh(scpnt); >>>> - if (fc_ret) >>>> - ret = fc_ret; >>>> +retry_rport_blocked: >>>> + spin_lock_irqsave(host->host_lock, flags); >>>> + list_for_each_entry(port, &adapter->port_list, list) { >>> >>> You need to take the `adapter->port_list_lock` to iterate over the >>> `port_list`. >>> >>> i.e.: read_lock_irqsave(&adapter->port_list_lock, flags); >>> >>>> + struct fc_rport *rport = port->rport; >>>> + >>>> + if (rport->port_state == FC_PORTSTATE_BLOCKED) { >>>> + if (rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT) >>>> + ret = FAST_IO_FAIL; >>>> + else >>>> + ret = NEEDS_RETRY; >>>> + break; >>>> + } >>>> + } >>>> + spin_unlock_irqrestore(host->host_lock, flags); >>>> + if (ret == NEEDS_RETRY) { >>>> + msleep(1000); >>>> + goto retry_rport_blocked; >>>> + } >>> >>> I really can't say I like this open coded FC code in the driver at all. >>> >>> Is there a reason we can't use `fc_block_rport()` for all the rports of >>> the adapter? > > Waiting for all rports to unblock in host_reset has been on my todo list > since we prepared the eh callbacks to get rid of scsi_cmnd with v4.18 > commits: > 674595d8519f ("scsi: zfcp: decouple our scsi_eh callbacks from scsi_cmnd") > 42afc6527d43 ("scsi: zfcp: decouple TMFs from scsi_cmnd by using > fc_block_rport") > 26f5fa9d47c1 ("scsi: zfcp: decouple SCSI setup of TMF from scsi_cmnd") > 39abb11aca00 ("scsi: zfcp: decouple FSF request setup of TMF from > scsi_cmnd") > e0116c91c7d8 ("scsi: zfcp: split FCP_CMND IU setup between SCSI I/O and > TMF again") > 266883f2f7d5 ("scsi: zfcp: decouple TMF response handler from scsi_cmnd") > 822121186375 ("scsi: zfcp: decouple SCSI traces for scsi_eh / TMF from > scsi_cmnd") > > But the synchronization is non-trivial as Benjamin's question shows. > There are also considerations about lock order, etc. > > I'm busy with other things, so don't hold your breath until I can review > and test the code; I don't want any regression in that recovery code. > >>> We already do use it for other EH callbacks in the same file, and you >>> already look up the rports in the adapters rport-list; so using that on >>> the rports in the loop, instead of open-coding it doesn't seem bad? Or >>> is there a locking problem? >>> >>> We might waste a few cycles with that, but frankly, this is all in EH >>> and after adapter reset.. all performance concerns went our of the >>> window with that already. >>> >> >> Question would be why we need to call fc_block_rport() at all in host >> reset. >> To my understanding a host reset is expected to do a full resync of the >> SAN topology, so the expectation is that after zfcp_erp_wait() the port >> list is stable (ie the HBA has finished processing all RSCNs related to >> the SAN resync). > > There is more to do in zfcp than in other FC HBA drivers, e.g. LUN open > recoveries and how they related to rport unblock: > v4.10 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN > recovery"). > The rport unblock is async to our internal recovery. zfcp_erp_wait() > only waits for the latter by design. > >> So can't we just drop the fc_block_rport() call here? > > I don't think so. > >> All the other FC drivers do fine without that ... > > It would have been nice to have a common interface for all scsi_eh > scopes. I.e. fc_block_host(struct Scsi_Host*) like we already have for > fc_block_scsi_eh(struct scsi_cmnd*) and fc_block_rport(struct fc_rport*) > [the latter having been introduced at the time of above eh callback > preparations]. > But if zfcp is the only one needing it for host_reset, having the code > only in zfcp seems fine to me. > > Right. Just wanted to clarify that. If we need to use fc_block_rport() in host reset so be it; just wanted to clarify if this _really_ is the case (and not just some copy'n'paste stuff). I'll be reworking the patch to call fc_block_rport(). Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@xxxxxxx +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer