Re: Bad emulex/linux FC error handing behavior

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



James Smart wrote:
> However, the question in my mind is - why did you get to bus reset ?
Because the device is having intermittent problems? The whole error handler sequence fails (tur failures, etc), and it ends up marking the device off-line. In the process it shoots everything else in the head. This is the behavior i'm having a problem with. I don't really care about the state of the failing device, it is having a physical problem. My problem is the remainder of the shared devices which are having their activities interrupted. In many cases, those other machines/devices many not even have visibility to the failing device. It becomes a serious error isolation problem. From the perspective of other hosts, the only way to track the error down is to actually have an analyzer attached to the interrupted devices. Assuming it reproduces, the analyzer can then detect the reset and identify the source port it originated from. That machine may then be removed from the SAN. This whole process can be nearly impossible to perform at a customer's site.


>The reason for the behavior is to replicate the parallel scsi behavior,
>which is expected/required by many people.

I'm confused by this. For parallel SCSI, there were device dependencies due to the physical bus. The bus reset was standard error handing because a bad/failing SCSI device often put the bus in a unrecoverable state for the remainder of the devices. SPI also rarely had multiple initiators sharing devices. I was unaware of how big the "hammer" lpfc tends to use against the SAN when a device fails. I suspect that I'm not the only one. Is there are way to simulate the SPI behavior(?), short of actually resetting all attached devices? For that matter, I'm a little confused what exactly the intended behavior is. Can you enlighten me? I could understand if it was just resetting all luns on a particular device, but its resetting all attached devices.


> We can certainly discuss adding a parameter that
> controls the behavior, but this should be on a transport basis, not on
> an adapter-specific manner.

Thats a great plan. To me it makes sense that this behavior should be transport dependent, I would want it for SPI, but not for FC or iSCSI. How likely is that to be accepted? The SCSI error hander seem to be completely transport independent. Initially, I targeted the emulex driver because the qlogic already has a way to disable the behavior, and the LSI driver doesn't appear to support this behavior at all.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux