On 13-04-24 05:08 AM, Chris Dunlop wrote:
Hi, I have 3 boxes, each with an LSI 9211-8i and a mix of LSI expanders (Supermicro SAS-846EL2, SAS-826EL2). For some of my expanders, 'sg_ses -j' (originally sg3_utils 1.33, now 1.35) is showing: Slot 24 [0,23] Element type: Array device slot ... Additional Element Status: Transport protocol: Oxc not decoded
According to table 477 in section 7.6.1 of spc4r36f.pdf protocol identifier 0xc is reserved. As far as I can see it has never been defined to a known protocol. So either SuperMicro/LSI is getting creative or it is a case of GIGO (garbage in, garbage out).
...where the slot contains a SATA device. It's always Slot 24, and other slots show up fine. E.g. on one of the expanders with SATA drives in both Slot 23 and 24: h3# sg_ses -j /dev/sg81 LSI SAS2X36 0e0b Primary enclosure logical identifier (hex): 500304800013453f ... Slot 23 [0,22] Element type: Array device slot Enclosure Status: Predicted failure=0, Disabled=0, Swap=0, status: OK OK=0, Reserved device=0, Hot spare=0, Cons check=0 In crit array=0, In failed array=0, Rebuild/remap=0, R/R abort=0 App client bypass A=0, Do not remove=0, Enc bypass A=0, Enc bypass B=0 Ready to insert=0, RMV=0, Ident=0, Report=0 App client bypass B=0, Fault sensed=0, Fault reqstd=0, Device off=0 Bypassed A=0, Bypassed B=0, Dev bypassed A=0, Dev bypassed B=0 Additional Element Status: Transport protocol: SAS number of phys: 1, not all phys: 0, device slot number: 22 phy index: 0 device type: no device attached initiator port for: target port for: SATA_device attached SAS address: 0x500304800013453f SAS address: 0x5003048000134522 phy identifier: 0x0 Slot 24 [0,23] Element type: Array device slot Enclosure Status: Predicted failure=0, Disabled=0, Swap=0, status: OK OK=0, Reserved device=0, Hot spare=0, Cons check=0 In crit array=0, In failed array=0, Rebuild/remap=0, R/R abort=0 App client bypass A=0, Do not remove=0, Enc bypass A=0, Enc bypass B=0 Ready to insert=0, RMV=0, Ident=0, Report=0 App client bypass B=0, Fault sensed=0, Fault reqstd=0, Device off=0 Bypassed A=0, Bypassed B=0, Dev bypassed A=0, Dev bypassed B=0 Additional Element Status: Transport protocol: Oxc not decoded ... This may be unrelated, but 'sg_ses -j' is also coming up with the following error on 3 of the 6 expanders identified as "LSI SAS2X36 0e0b" (this doesn't include any of the expanders with the Slot 24 problem): join_work: oi=6, ei=255 (broken_ei=0) not in join_arr
This inconsistency error supports my GIGO theory.
The expander types are: ---------------------------------------------------------------------- $ for h in h1 h2 h3; do echo "=== $h ===" ssh $h 'lsscsi | grep enclosu' done === h1 === [0:0:24:0] enclosu LSI CORP SAS2X36 0717 - [0:0:27:0] enclosu LSI SAS2X36 0e0b - [0:0:38:0] enclosu LSI CORP SAS2X28 0717 - [0:0:62:0] enclosu LSI SAS2X36 0e0b - [0:0:85:0] enclosu LSI SAS2X36 0e0b - === h2 === [0:0:25:0] enclosu LSI CORP SAS2X36 0717 - [0:0:29:0] enclosu LSI CORP SAS2X28 0717 - === h3 === [0:0:23:0] enclosu LSI CORP SAS2X36 0717 - [0:0:45:0] enclosu LSI SAS2X36 0e0b - [0:0:57:0] enclosu LSI CORP SAS2X28 0717 - [0:0:81:0] enclosu LSI SAS2X36 0e0b - [0:0:88:0] enclosu LSI SAS2X36 0e0b - ---------------------------------------------------------------------- ...and they're daisy-chained like this: ---------------------------------------------------------------------- for h in b2 b4 b5; do echo "=== $h ===" ssh $h 'find /sys/bus/scsi/devices/host0/ -name expander\* | egrep -v "bsg|sas_(expander|device)"' done === h1 === /sys/bus/scsi/devices/host0/port-0:0/expander-0:0 /sys/bus/scsi/devices/host0/port-0:0/expander-0:0/port-0:0:0/expander-0:1 /sys/bus/scsi/devices/host0/port-0:0/expander-0:0/port-0:0:0/expander-0:1/port-0:1:25/expander-0:4 /sys/bus/scsi/devices/host0/port-0:1/expander-0:2 /sys/bus/scsi/devices/host0/port-0:1/expander-0:2/port-0:2:0/expander-0:3 === h2 === /sys/bus/scsi/devices/host0/port-0:0/expander-0:0 /sys/bus/scsi/devices/host0/port-0:1/expander-0:1 === h3 === /sys/bus/scsi/devices/host0/port-0:0/expander-0:0 /sys/bus/scsi/devices/host0/port-0:0/expander-0:0/port-0:0:0/expander-0:1 /sys/bus/scsi/devices/host0/port-0:1/expander-0:2 /sys/bus/scsi/devices/host0/port-0:1/expander-0:2/port-0:2:0/expander-0:3 /sys/bus/scsi/devices/host0/port-0:1/expander-0:2/port-0:2:0/expander-0:3/port-0:3:0/expander-0:4 ---------------------------------------------------------------------- (Sorry, I don't know how to relate the /sys/bus/scsi stuff to the scsi ids or /dev/sgXX.)
Best to look at the mapping to /dev/bsg device nodes in this case.
The errors are showing up like: ---------------------------------------------------------------------- $ for h in h1 h2 h3; do ssh $h ' for d in $(lsscsi -tg | awk "\$2 == \"enclosu\" { print \$5 }"); do echo "=== $(hostname):$d ===" sg_ses -j $d 2>&1 done ' done | egrep 'LSI|^=|^Slot 24|join_work|not decoded' | sed -r 's/^=/\n=/' === h1:/dev/sg24 === LSI CORP SAS2X36 0717 Slot 24 [0,23] Element type: Array device slot === h1:/dev/sg27 === LSI SAS2X36 0e0b Slot 24 [0,23] Element type: Array device slot Transport protocol: Oxc not decoded === h1:/dev/sg38 === LSI CORP SAS2X28 0717 === h1:/dev/sg62 === LSI SAS2X36 0e0b Slot 24 [0,23] Element type: Array device slot Transport protocol: Oxc not decoded === h1:/dev/sg81 === join_work: oi=6, ei=255 (broken_ei=0) not in join_arr LSI SAS2X36 0e0b === h2:/dev/sg25 === LSI CORP SAS2X36 0717 Slot 24 [0,23] Element type: Array device slot === h2:/dev/sg29 === LSI CORP SAS2X28 0717 === h3:/dev/sg23 === LSI CORP SAS2X36 0717 Slot 24 [0,23] Element type: Array device slot === h3:/dev/sg45 === join_work: oi=6, ei=255 (broken_ei=0) not in join_arr LSI SAS2X36 0e0b === h3:/dev/sg57 === LSI CORP SAS2X28 0717 === h3:/dev/sg81 === LSI SAS2X36 0e0b Slot 24 [0,23] Element type: Array device slot Transport protocol: Oxc not decoded === h3:/dev/sg88 === join_work: oi=6, ei=255 (broken_ei=0) not in join_arr LSI SAS2X36 0e0b ---------------------------------------------------------------------- What should I be looking at, or what info I can provide to help track down these issues?
I have a cheap SuperMicro disk enclosure (CSE-M35TQ) and never could find any info on its disk management chip (MG9072). My feeling was the MG9072 came with generic settings that SuperMicro should have specialized for their product, a job SuperMicro did somewhat poorly. [At least that is good for my error checking code :-)] Also if I put more than two disks in that enclosure, the SGPIO ** protocol seems to fall apart, leading to complete GIGO. So, if I were you, I'd be happy with any information you can get and not waste too much time over the rest. sg_ses has been tested with some higher end enclosures which are much more compliant, but many still have small quirks. Doug Gilbert ** SAS-2 expanders tend to have integrated enclosure devices which communicate with enclosures via SGPIO. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html