On Tue, 2022-04-12 at 13:47 -0500, Benjamin Marzinski wrote: > On Tue, Apr 12, 2022 at 10:31:50AM +0000, Martin Wilck wrote: > > > I agree that setting fast_io_fail_tmo to different values based on > both > array and protocol is usually overkill. If we doubt that there is a serious use case for this level of differentiation, I think we shouldn't implement it. Doing that would violate the KISS principle. > > > > What about adding simply adding "protocols" as a new top section in > > the > > conf file, and have the per-protocol settings override built-in > > hwtable > > settings for any array, but not explicit storage-device settings in > > the > > conf file? > > I'm not really enamored with the idea of only working on the built-in > hwtable. > I feel like the people that want to tune things based on the > protocol are the same sort of people that want tune things per array. Hm, maybe we are misunderstanding each other. Let me give an example what I meant. We've got dev_loss = inifinity for ONTAP in the builtin hwtable. My idea would be that users could override this in the protocols section for all devices using a certain protocol: protocols { protocol { type scsi:fcp dev_loss_tmo 120 } } This would take precedence over the built-in hwtable, but not over an explicit config file entry: devices { device { vendor NETAPP product LUN # With the dev_loss_tmo line below, RDAC would use 300 for # every protocol. Without it, it would use 120 for FC and # "infinity" for other protocols. dev_loss_tmo 300 } } Admittedly, the problem on the implementation side is that we don't distinguish between built-in hwentries and those from explicit configuration. Changing this would be considerable effort; perhaps more effort than what you did in your patch set. I haven't thought it through. > More importantly, we don't have anything else in multipath that only > applies to some of the device config entries. That change seems more > confusing to me than adding a new subsection. The main change would be to differentiate between built-in and user- configured hardware properties. I hope most users would be able to understand the concept. > The protocol subsection is > visually part of the device config it is modifying. A separate > protocol > section that only modified some of the device configs > seems less > obvious. Plus we don't have a good way to code that. Also, what > happens > to merged configs, where some of the timeouts came from a builtin > config, and some came from the user config. To clarify once more: this is what I meant, built-in configs would be overridden, user configs wouldn't be. This is different from "defaults", as "defaults" don't override hardware-specific built-ins. > If you are really agains the subsection idea, > I would rather have the protocol section override > all the device configs. Users would need to be o.k. with picking a > protocol for which all their arrays have the same timeout values. But > again, that should be the common case. The question is whether a "protocol" entry should override device settings, or vice versa (as in my hypothetical ONTAP example above). We seem to agree that that device subsection would implement a level of complexity that hardly any user needs. If this is the case, we'd just need to determine the precedence between "devices" and "protocols" sections. If we determine that "protocols" should always take precedence, we might as well allow it under "overrides" only. We'd need no "protocols" section in that case, and no differentiation between built-in and user-configured hwentries. > > [1]: wrt dev_loss_tmo, I tend to think that the best value must be > > found based on neither protocol nor array, but data center staff. > > I do agree that fast_io_fail_tmo and eh_deadline make more sense than > dev_loss_tmo (especially since FC/iSCSI setups seem the most common, > and > iSCSI doesn't support dev_loss_tmo), but since the section is there, > and > dev_loss_tmo is a per-path timeout setting, I put it in. > ... I'm fine with treating dev_loss_tmo as a protocol/path property. > > [2]: I strongly dislike "overrides", in general. I wonder what we > > need > > it for, except for quick experiments where people are too lazy to > > add > > explicit settings for devices and/or multipaths. > > The biggest reason is that some of the builtin device configs do > things > like set no_path_retry to "queue". You don't need to use "overrides" for that: devices { device { vendor .* product .* no_path_retry 75 } } You can follow up with more device entries that define exceptions for the general rule above. Am I missing something? AFAICT the only thing you can do with "overrides" but not with the trick above is override actual hardware-specific user configs, and I have a hard time figuring out why someone would work out detailed device-specific configs just to override them again with a big hammer. Regards, Martin -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel