Adding correct Quinn and removing qlogic.com email ID. It's nice to hear from you Dr Greg __ We will look at the request and get back to you. Thanks, Himanshu On 9/3/19, 3:39 PM, "Dr. G.W. Wettstein" <greg@xxxxxxxxxxxxxxxxx> wrote: External Email ---------------------------------------------------------------------- Good afternoon, I hope the week is going well for everyone. Himanshu/Quinn, it has been a while since we spoke but I think I have the right group targeted for this. I see that you have evolved from Qlogic to Cavium to Marvell e-mail addresses, which must indicate the transition of who is making the hardware... :-) I've been away focused on other issues but I noted with interest that you are now distributing the Qlogic-Target/SCST interface driver that we developed with your 32GBPS hardware capable driver. I'm pleased to see that it must have utility with respect to providing support for SCST on mainstream Linux. I wanted to bounce an issue off your collective judgement since you may be the only people that know the answer to this and we may be the only ones that can demonstrate and/or test the issue. In the following commit: commit 2c39b5c ("qla2xxx: Remove SRR code") Himanshu had pulled the SRR code handling out of the target driver. The changelog indicates that the code was removed since no one appeared to be using it but it could be re-added if there was a need for it. While SRR is ostensibly in support of tape, a block target initiator on an FCOE fabric will issue SRR's in response to frame drops on the underlying ethernet fabric. In the following commit: commit 6f58c78 ("qla2xxx: Fix kernel panic on selective retransmission request") We fixed a bug that caused the Qlogic target driver to immediately panic the kernel if an SRR was received and debugging was enabled. We found this bug secondary to debugging an issue with the fact that the Clipper ASIC's on Cisco Nexus 7K switches are not configured by default to support lossless fibre-channel frame transport. We have been using a version of the target driver from before when the SRR code was removed that has been performing flawlessly which is why we haven't spent time monkeying with upgrades. Secondary to a recent firmware upgrade on the Nexus switches that we conducted, we ended up in a situation where one of the ethernet interfaces in a port-channel began dropping frames at high rates. This caused one of our target servers, running a 4GBPS 2462 HBA in target mode, to immediately panic on what appears to be the BUG_ON assertion at the start of the scst_tgt_cmd_done() function. We were going to run this down when we noticed that you had officially married the Qlogic target driver with the SCST interface driver. So we rolled a 4.9.190 kernel with the driver out of the SCST trunk to see how it handled all of this since we would prefer to be on something that you guys are directly and actively maintaining. We tested the resultant kernel and target driver and it appears to be functioning fine. After thinking about things for a bit we elected to not expose the target server to the initiators on the other side of the port-channel which has the interface with the high frame discard rate. The rationale for this is that we are unsure of what the behavior is going to be when a block initiator issues an SRR against a target server that has had the SRR handling code removed. We currently have three target servers that are handling thousands of SRR's from these initiators on a day in and day out basis with no ill effects, with the old driver that has SRR handling support. One of the local patches we carry has debug code that prints out which initiator has issued an SRR. So we know that the SRR handling code is being invoked and appears to be coping with the situation just fine, which is what is giving us pause with respect to exposing the new target server to those initiators. If the initiators throw an error that is well and good but we are obviously concerned about possible silent failures and/or corruption. Since all of this is so esoteric and uncommon we wanted to bounce this off the experts to get your sense on what we are dealing with. Obviously if we can help debug and/or improve the driver or provide a rationale for folding the SRR code back into the driver we would be more then happy to contribute and test. Let me know your thoughts and we will go from there. Have a good day. Dr. Greg As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102 development. PH: 701-281-1686 EMAIL: greg@xxxxxxxxxxxx ------------------------------------------------------------------------------ "If you plugged up your nose and mouth right before you sneezed, would the sneeze go out your ears or would your head explode? Either way I'm afraid to try." -- Nick Kean --