Re: LSI SAS changes SCSI address and by-path on hot-swap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Moore, Michael wrote:
Sorry for top posting, but Outlook just screws it all up.

The cards I've used are a LSI Logic SAS 3800X (8 port External PCI-X card w/ 2 x SFF-8470 SAS connectors) and LSI SAS 3801E ( 8 Port External PCI-e card with 2 x SFF-8088 SAS connectors).  Each connector has 4 SAS links.
The SAS protocol is downwardly compatible with SATA, so you can run SATA drives right on a SAS cable.

So, in my setup, I basically have 1 drive per SAS link. No expanders, or anything fancy. The issues I mentioned happens to the 4 drives on the same connector. When the driver is detecting the new drive, it looks like it redetects all of the drives on the connector (or it at least reports one new drive and the other existing drives). If you were in a directory from one of the mounted drives, you get IO Errors as it appears that the drive was removed, and then remounted, but in a way that was not clean.
This has happened with Default CentOS 5 kernels (2.6.18-*.el5), 2.6.26 vanilla, 2.6.30 vanilla, Fedora latest.
The issue appeared no matter what.

The udev rules used the ENV{ID_PATH} option to tie to the sysfs value that indicated which PCI ID + SAS phy on the SAS HBA used by the drives to the device detected by the kernel, and then create a symlink from the /dev/sd<X> entry to /dev/slot<Y>, where Y is the label on the slot of the hot swap bays (a-h).   Here is an example of the rule:

KERNEL=="sd*", ENV{ID_PATH}=="pci-0000:04:00.0-sas-phy0:1*", SYMLINK+="slota%n"

I did this because the device ID number that the kernel reports increments every time a drive is swapped.  So, even though you are using the same SAS channel, you do not have a consistent drive numbering.  So I had to go down to the SAS phy to get something consistent.  The SiI-3124/libata setup had consistent device ID's (the ID was tied to the SATA channel, and I used the device ID to do the mapping.  Perhaps udev is the reason for the issues, but I tend to think it is the way the SAS/SCSI subsystem works as I have never seen the SATA/libata subsystem have this "rescan/remount" behavior.

This looks like a horrible bug for people having software RAID on the disks (or maybe even hardware RAID)

I seem not to have this bug on ubuntu kernel 2.6.24, I mean my situation was similar with the mainboard-integrated LSISAS 1068E and it didn't happen to me, but that doesn't mean much...

Also, LSI controllers are very much used by linuxers.
Have you tried reporting it here and try to get it fixed?
Or reporting it to the LSI tech support? They are pretty responsive even if their web interface is a bit strange.

I'm thinking about buying a few of LSI HBA controllers for linux software RAID use, probably external ones like the one you have. Maybe attached to expanders. I'll keep my fingers crossed!

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux