On Tue, Apr 12, 2022 at 10:31:50AM +0000, Martin Wilck wrote: > On Mon, 2022-04-11 at 20:59 -0500, Benjamin Marzinski wrote: > > Some storage arrays can be accessed using multiple protocols at the > > same > > time. I've have customers request the ability to set different > > values > > for the path specific timeouts, like fast_io_fail_tmo, based on the > > protocol used to access the path. In order to make this possible, > > this > > patchset adds a new protocol subsection to the device subsection and > > the > > overrides section. This allows users to set a device config's path > > specific timeouts, such as dev_loss_tmo, fast_io_fail_tmo, and > > eh_deadline on a per-protocol basis. > > Sigh... I am not happy about this amount of additional complexity in > the multipath configuration. It is already so complicated that hardly > anyone really understands how it works. > > I fully agree that some properties, in particular fast_io_fail_tmo [1], > are rather properties of protocol or fabrics than a storage array. > But do we really need to differentiate by both "device" and "protocol"? > IOW, do users need different fast_io_fail_tmo value for the same > protocol, but different arrays? My feeling is that making this property > depend on a 2-D matrix of (protocol x storage) is overcomplicating > matters. And _if_ this is necessary, what do we gain by adding an > "overrides" on top? [2] I agree that setting fast_io_fail_tmo to different values based on both array and protocol is usually overkill. This is why I added it to the overrides section as well. If you just put it there, it will work for all devices. I assumed that would be the common case. It wouldn't make sense to have it be in the defaults section and override things in devices section, but it's a natural fit for the overrides section. Plus, since I added the protocol table to the hwentry, enabling it in the overrides section wasn't much new code. > What about adding simply adding "protocols" as a new top section in the > conf file, and have the per-protocol settings override built-in hwtable > settings for any array, but not explicit storage-device settings in the > conf file? I'm not really enamored with the idea of only working on the built-in hwtable. I feel like the people that want to tune things based on the protocol are the same sort of people that want tune things per array. More importantly, we don't have anything else in multipath that only applies to some of the device config entries. That change seems more confusing to me than adding a new subsection. The protocol subsection is visually part of the device config it is modifying. A separate protocol section that only modified some of the device configs seems less obvious. Plus we don't have a good way to code that. Also, what happens to merged configs, where some of the timeouts came from a builtin config, and some came from the user config. If you are really against the subsection idea, I would rather have the protocol section override all the device configs. Users would need to be o.k. with picking a protocol for which all their arrays have the same timeout values. But again, that should be the common case. > Regards, > Martin > > [1]: wrt dev_loss_tmo, I tend to think that the best value must be > found based on neither protocol nor array, but data center staff. I do agree that fast_io_fail_tmo and eh_deadline make more sense than dev_loss_tmo (especially since FC/iSCSI setups seem the most common, and iSCSI doesn't support dev_loss_tmo), but since the section is there, and dev_loss_tmo is a per-path timeout setting, I put it in. I thought about letting people change the checker type per protocol, but figured that could wait till someone asked for it. This would make less sense if we had a seperate top level protocol section. So would things like per-protocol manginal path handling, and other path specific things which could reasonably go in a protocol subsection of a device config. > [2]: I strongly dislike "overrides", in general. I wonder what we need > it for, except for quick experiments where people are too lazy to add > explicit settings for devices and/or multipaths. The biggest reason is that some of the builtin device configs do things like set no_path_retry to "queue". I know people have used it to override dev_loss_tmo and fast_io_fail_tmo, but the big one is no_path_retry. But in general, some builtin device configs make choices that aren't applicable for all environments, but are what the vendors have validated and want for the default. While you can call it lazy, it's always possible that a new builtin config will be added, or and existing one changed, and suddenly your devices are hanging instead of failing like expected when all the paths go down, because the devices are now configured with a different no_path_retry value. It's safer to simply disallow this possiblity. Also, the overrides section exists so that it is possible to set up a multipath config that will work within some environment's constraints, regardless of the hardware that is attached. For instance, in virtualized setups, no_path_retry "queue" can be really annoying. So virtualization solutions that use multipath can distribute a multipath.conf that overrides these settings on all types of possible devices, without having to go through and modify every problematic builtin device entry, and update their config as new entries are added. -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel