On Tue, Apr 12, 2022 at 08:47:38PM +0000, Martin Wilck wrote: > On Tue, 2022-04-12 at 13:47 -0500, Benjamin Marzinski wrote: > > On Tue, Apr 12, 2022 at 10:31:50AM +0000, Martin Wilck wrote: > > > > > I agree that setting fast_io_fail_tmo to different values based on > > both > > array and protocol is usually overkill. > > If we doubt that there is a serious use case for this level of > differentiation, I think we shouldn't implement it. Doing that would > violate the KISS principle. > > > > > > > What about adding simply adding "protocols" as a new top section in > > > the > > > conf file, and have the per-protocol settings override built-in > > > hwtable > > > settings for any array, but not explicit storage-device settings in > > > the > > > conf file? > > > > I'm not really enamored with the idea of only working on the built-in > > hwtable. > > I feel like the people that want to tune things based on the > > protocol are the same sort of people that want tune things per array. > > Hm, maybe we are misunderstanding each other. Let me give an example > what I meant. We've got dev_loss = inifinity for ONTAP in the builtin > hwtable. My idea would be that users could override this in the > protocols section for all devices using a certain protocol: > > protocols { > protocol { > type scsi:fcp > dev_loss_tmo 120 > } > } > > This would take precedence over the built-in hwtable, but not over an > explicit config file entry: > > devices { > device { > vendor NETAPP > product LUN > # With the dev_loss_tmo line below, RDAC would use 300 for > # every protocol. Without it, it would use 120 for FC and > # "infinity" for other protocols. > dev_loss_tmo 300 > } > } > > Admittedly, the problem on the implementation side is that we don't > distinguish between built-in hwentries and those from explicit > configuration. Changing this would be considerable effort; perhaps more > effort than what you did in your patch set. I haven't thought it > through. > > > More importantly, we don't have anything else in multipath that only > > applies to some of the device config entries. That change seems more > > confusing to me than adding a new subsection. > > The main change would be to differentiate between built-in and user- > configured hardware properties. I hope most users would be able to > understand the concept. > > > The protocol subsection is > > visually part of the device config it is modifying. A separate > > protocol > > section that only modified some of the device configs > > seems less > > obvious. Plus we don't have a good way to code that. Also, what > > happens > > to merged configs, where some of the timeouts came from a builtin > > config, and some came from the user config. > > To clarify once more: this is what I meant, built-in configs would be > overridden, user configs wouldn't be. This is different from > "defaults", as "defaults" don't override hardware-specific built-ins. But what do you call a device config that is the result of merging (via merge_hwe()) a built-in and a non-built-in config. Do we really want to track that some of the values of this merged config need to check the protocol section, and some don't? We could remove merging identical configs, but that simply makes it harder for users to figure out how their device will be configured from the configuration output. I understand your idea. I'd just rather that it worked on all the device configs, instead of only the built-in ones. I think overriding only the built-in configs is needlessly complicated, both from a coding and from an explaining point of view. > > If you are really agains the subsection idea, > > I would rather have the protocol section override > > all the device configs. Users would need to be o.k. with picking a > > protocol for which all their arrays have the same timeout values. But > > again, that should be the common case. > > The question is whether a "protocol" entry should override device > settings, or vice versa (as in my hypothetical ONTAP example above). We > seem to agree that that device subsection would implement a level of > complexity that hardly any user needs. If this is the case, we'd just > need to determine the precedence between "devices" and "protocols" > sections. If we determine that "protocols" should always take > precedence, we might as well allow it under "overrides" only. We'd need > no "protocols" section in that case, and no differentiation between > built-in and user-configured hwentries. That seems reasonable. > > > [1]: wrt dev_loss_tmo, I tend to think that the best value must be > > > found based on neither protocol nor array, but data center staff. > > > > I do agree that fast_io_fail_tmo and eh_deadline make more sense than > > dev_loss_tmo (especially since FC/iSCSI setups seem the most common, > > and > > iSCSI doesn't support dev_loss_tmo), but since the section is there, > > and > > dev_loss_tmo is a per-path timeout setting, I put it in. > > ... > > I'm fine with treating dev_loss_tmo as a protocol/path property. > > > > [2]: I strongly dislike "overrides", in general. I wonder what we > > > need > > > it for, except for quick experiments where people are too lazy to > > > add > > > explicit settings for devices and/or multipaths. > > > > The biggest reason is that some of the builtin device configs do > > things > > like set no_path_retry to "queue". > > You don't need to use "overrides" for that: > > devices { > device { > vendor .* > product .* > no_path_retry 75 > } > } > You can follow up with more device entries that define exceptions for > the general rule above. Am I missing something? > > AFAICT the only thing you can do with "overrides" but not with the > trick above is override actual hardware-specific user configs, and I > have a hard time figuring out why someone would work out detailed > device-specific configs just to override them again with a big hammer. Fair enough. I added the overrides section before you made paths have a vector of device configs. Back then, there was no way to have a device config that would work like your above example. My original idea was to be able to have a special device section like this: device { all_devs yes no_path_retry 75 } that would get merged with all the device sections. The overrides section was the compromise after my original idea was NAKed. We probably could deprecate the overrides section now that we have a vector of device configs. But then we shouldn't go putting the protocol stuff there. -Ben > Regards, > Martin -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel