SRR response handling.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Good afternoon, I hope the week is going well for everyone.

Himanshu/Quinn, it has been a while since we spoke but I think I have
the right group targeted for this.  I see that you have evolved from
Qlogic to Cavium to Marvell e-mail addresses, which must indicate the
transition of who is making the hardware... :-)

I've been away focused on other issues but I noted with interest that
you are now distributing the Qlogic-Target/SCST interface driver that
we developed with your 32GBPS hardware capable driver.  I'm pleased to
see that it must have utility with respect to providing support for
SCST on mainstream Linux.

I wanted to bounce an issue off your collective judgement since you
may be the only people that know the answer to this and we may be the
only ones that can demonstrate and/or test the issue.

In the following commit:

commit 2c39b5c ("qla2xxx: Remove SRR code")

Himanshu had pulled the SRR code handling out of the target driver.
The changelog indicates that the code was removed since no one
appeared to be using it but it could be re-added if there was a need
for it.

While SRR is ostensibly in support of tape, a block target initiator
on an FCOE fabric will issue SRR's in response to frame drops on the
underlying ethernet fabric.  In the following commit:

commit 6f58c78 ("qla2xxx: Fix kernel panic on selective retransmission request")

We fixed a bug that caused the Qlogic target driver to immediately
panic the kernel if an SRR was received and debugging was enabled.  We
found this bug secondary to debugging an issue with the fact that the
Clipper ASIC's on Cisco Nexus 7K switches are not configured by
default to support lossless fibre-channel frame transport.

We have been using a version of the target driver from before when the
SRR code was removed that has been performing flawlessly which is why
we haven't spent time monkeying with upgrades.

Secondary to a recent firmware upgrade on the Nexus switches that we
conducted, we ended up in a situation where one of the ethernet
interfaces in a port-channel began dropping frames at high rates.
This caused one of our target servers, running a 4GBPS 2462 HBA in
target mode, to immediately panic on what appears to be the BUG_ON
assertion at the start of the scst_tgt_cmd_done() function.

We were going to run this down when we noticed that you had officially
married the Qlogic target driver with the SCST interface driver.  So
we rolled a 4.9.190 kernel with the driver out of the SCST trunk to
see how it handled all of this since we would prefer to be on
something that you guys are directly and actively maintaining.

We tested the resultant kernel and target driver and it appears to be
functioning fine.  After thinking about things for a bit we elected to
not expose the target server to the initiators on the other side of
the port-channel which has the interface with the high frame discard
rate.  The rationale for this is that we are unsure of what the
behavior is going to be when a block initiator issues an SRR against a
target server that has had the SRR handling code removed.

We currently have three target servers that are handling thousands of
SRR's from these initiators on a day in and day out basis with no ill
effects, with the old driver that has SRR handling support.  One of
the local patches we carry has debug code that prints out which
initiator has issued an SRR.  So we know that the SRR handling code is
being invoked and appears to be coping with the situation just fine,
which is what is giving us pause with respect to exposing the new
target server to those initiators.

If the initiators throw an error that is well and good but we are
obviously concerned about possible silent failures and/or corruption.

Since all of this is so esoteric and uncommon we wanted to bounce this
off the experts to get your sense on what we are dealing with.
Obviously if we can help debug and/or improve the driver or provide a
rationale for folding the SRR code back into the driver we would be
more then happy to contribute and test.

Let me know your thoughts and we will go from there.

Have a good day.

Dr. Greg

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686            EMAIL: greg@xxxxxxxxxxxx
------------------------------------------------------------------------------
"If you plugged up your nose and mouth right before you sneezed, would
 the sneeze go out your ears or would your head explode?  Either way I'm
 afraid to try."
                                -- Nick Kean

-- 



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux