* James Smart > We had a lot of conversations on what to have the transport do after > connectivity was lost to a device. Suffice to say - the answer was to > remove the device. The dev_loss_tmo value was the compromise > between the kernel architectural position, and what FC drivers had > always managed and hidden from the kernel in the past. > > Unfortunately, even though DM has known of this behavior for a long > time (it's existed since 2.6.<early teens>, no-one has bothered to > update DM to support it. One train of thought is : fixing it for FC > doesn't address the issue, as other transports may still encounter > it. It's a DM thing, and should stay this way to ensure that DM fixes > it. So basically you're forcing breakage on users to corece the DM folks to fix their end? Maybe it's time to re-think that strategy, seeing that it hasn't been fixed in such a long time it seems unlikely that they would start bothering now. SuSE and RH works around it anyway, so everyone who's obedient enough to run the enterprise distros their storage vendor tells them to won't have any problems. Many of the DM folks are employed by SuSE, RH, and various storage vendors - go figure. > You noted that the FC transport, in SLES10 and RHEL5, added a patch > that allowed for the scsi targets not to be torn down when > dev_loss_tmo timed out. This had little to do with DM, and everything > to do with reuse-after-free issues on mid-layer data structures that > were released as part of the teardown, as well as the timing of the > upstream reuse patches vs what the distro kernels could accept. But > DM certainly benefited from its behavior. > > I'd rather that DM got fixed so that it supports the necessary > architectural behavior. But, we've lived with the disto-specific > behavior as well, so it's not a strong sentiment. I'm just a simple user. To me the most important ting is that it - and by «it» I'm referring to the whole bundle of DM, SCSI, HBA driver, and the rest of the system - actually works. With no way of disabling dev_loss_tmo, it doesn't. It will break after intermittent failures, exactly the time where you need it to work the most. Knowing that the SCSI FC transport does the Right Thing isn't really any consolation. The patch seems like a rather simple fix. Quick and dirty, sure, but it would actually help out the likes of me who are putting this stuff in production. And if it defaulted to remove_on_dev_loss=0, it wouldn't really be intrusive either. It seems that it will take a while to get this properly fixed (both in DM and the -EEXIST issue), so what I'm asking is just a way to make it work in the interim. Regards -- Tore Anderson - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html