I have a server (RHEL 5.3) connected to 2 SAN extended fabrics (across 2 sites, distance 1 ms, links are ISL with 100 km long distance buffer credits) via 2 lpfc HBA's (LPe1105-HP FC with the RHEL 5.3 shipped LPFC driver 8.2.0.33.3p.) A SAN FABRIC reconfiguration (DWDM Ring failover from worker to protection) occured yesterday after some intersite telco link switch that lasted less than 0,3 ms. Only one FABRIC was impacted, named FABRIC2 Our server is connected to the FABRICs thru 2 edge switches, so not directly connected to the core switches on which the link failure occured. >From then, our server (which accesses thru the 2 fabrics the LUNS from our 2 sites) started to climb in terms of load average (up to 250 for a dual proc quadcore machine!) with a high percentage of iowait (up to 50%). We did some testing, bypassing DM-MP by issuing dd commands to the physical /dev/sdX devices (more than 30 LUNS are presented to the server, seen each thru 4 paths making more than 120 /dev/sd devices) and half of our dd processes went to D state, as well as some unitary scsi_id that we manually run on the same physical devices. Multipathd itself was also in D state. The only way to restore the whole thing was to reset the server HBA connected to FABRIC2, after 2 hours of investigation No kind of scsi log, or whatever did appear during the outage duration (~2 hours) despite the fact that the scsi timeouts set on the physical devices is 60s, that the HBA's timeout is 14s. The /sys/block/sdX/device/state were showing running state despite the fact that the devices (well half of them) were actually inaccessible. What leads me to : 1) assumption: it looks the lpfc driver following this SAN event goes in a black hole mode not returning any io error or whatever to the scsi upper layer 2) question: how come the scsi timers don't trigger and declare the device faulty (the answer may be in the above assumption). Any idea or tip on what could cause this, some FC SCN message not well handled or whatever ? Regards Brem -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html