RE: [RFC][PATCH 2/6] fnic: add fnic_scsi.c and fnic_io.h.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>Mike Christie wrote:
>> Joe Eykholt wrote:
>>> James Smart wrote:
>>>>
>>>> Mike Christie wrote:
>>>>>> Well - what should be happening is - prior to the reset or as
part of
>>>>>> it, the fc transport fc_remote_port_delete() call should be made
on 
>>>>>> all
>>>>>> those remote ports that connectivity is about to be terminated
on.  
>>>>>> This
>>>>>> will place all the associated targets/luns on those rports into a
>>>>>> blocked state, and start the devloss timer on them.  This will
>>>>>>suspend
>>>>>> the eh path as well.  Thus, things suspend until either the 
>>>>>> driver/fcoe
>>>>> What do you mean by that? For lpfc it will or for this driver?
This
>>>>> driver does not have that block call like
lpfc_block_error_handler, so
>>>>> if the rport event occurs after the scsi eh is running we do not 
>>>>> suspend
>>>>> the eh.
>>>>>
>>>>> So below I am saying we should make the lpfc_block_error_handler
>>>>> functionality and the equivalent in the qla2xxx and mpfc common so
>>>>> libfc/fcoe and fnic can use it.
>>>> Well there's successive layers of the onion here. And your right,
one >>>>of
>>>> them is the block_error_handler.  Agreed, all of this should be
common.
>>>>
>>>> -- james s
>>>>
>>>
>>> I think you're both on the right track.  When we reset the local
port, 
>>> it should make all
>>> remote ports non-ready ... we no longer have a PLOGI to them.  Until

>>> we redo FLOGI and
>>> discovery, no SCSI ops will succeed.  fc_lport_reset() calls 
>>> fc_lport_set_fid, which calls
>>> lp_rport_reset_list() ... but that doesn't seem to do much to rports

>>> other than the
>>> directory server.
>>>
>>> fc_rport_reset() puts the rport in state INIT, but I don't think 
>>> that's enough.  Maybe
>>> that's where the remote port should get blocked.  Sound right?
>>>
>> 
>> Did you see the thread
>> http://www.open-fcoe.org/pipermail/devel/2008-July/000394.html
>> I think we basically said we need to overhaul the fc class 
>> fc_remote_port and the libfc's use - some of the discussion went off 
>> list into some call. James and Chris were going to work on it. I
think 
>> both got busy with other issues or are still working on it.

>Hit send too soon.

>We basically should be deleting the port for these scenarios, then that

>sets the dev_loss_tmo and blocks the rport. We should then wait in the 
>eh for the rport to be added back or for the dev_loss_tmo to fire.

Apologies for not getting back earlier, I just got a chance to look at
this thread and review the code. Here's the summary of what I think
should be fixed, please correct if wrong -

1) fnic_host_reset should wait after call to lport_reset to allow all
rports to come back up or get offlined after dev_loss_tmo. Ideally,
there should be a wait on state for lport to get ready and then a wait
on each of the discovered rports to go to unblock. 

Currently, the fnic driver was relying on HOST_RESET_SETTLE_TIME (10
secs) after host reset success (since skip_settle_delay is 0) before
trying to send TUR to online devices. If fabric login succeeded and
rports came back up within that time, the TUR would succeed, and
commands would get moved out of the eh_work_q.

At a minimum, the wait and check lport state after initiating
lport_reset can be moved down to the fnic driver instead of relying on
SCSI-ML to do it.

2) fnic_abort_cmd and fnic_device_reset wait for rport to unblock as the
first step in the fxn since there is no point in sending any recovery
commands to remote port if it is currently gone. However, its not clear
why fnic_host_reset needs to block as its first step. All it does it
clean up all the SCSI commands locally and then resets the local port
which in turn resets the rports and restarts fabric login. As discussed
in (1) above, it is clear why blocking the thread is required after call
to lport_reset, it is not clear why the first step in host_reset should
be to block error handler (unless you never said that, and I
misunderstood)

3) In queuecommand, if local port is not ready after rport check is ok,
then return busy. This is to cover transition of rport from online to
blocked which started just after we did the rport check. Eventually if
dev_loss_tmo really fires and rport goes away, then the scsi_device will
get blocked and scsi-ml will block the device queue. If rport comes
back, subsequent queuecommand calls will be able to successfully issue
IOs.

Let me know if you have more review comments.

thanks,
--abhijeet


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux