On Tue, 2018-04-03 at 16:29 -0500, Benjamin Marzinski wrote: > On Tue, Apr 03, 2018 at 10:53:29PM +0200, Martin Wilck wrote: > > On Tue, 2018-04-03 at 15:31 -0500, Benjamin Marzinski wrote: > > > On Tue, Mar 27, 2018 at 11:50:52PM +0200, Martin Wilck wrote: > > > > If the hardware handler isn't explicitly set, infer ALUA > > > > support > > > > from the pp->tpgs attribute. Likewise, if ALUA is selected, but > > > > not supported by the hardware, fall back to no hardware > > > > handler. > > > > > > Weren't you worried before about temporary ALUA failures? If you > > > had > > > a > > > temporary failure while configuring a device that you explicitly > > > set > > > to > > > be ALUA, then this would cause the device to be misconfigured? > > > > I believe that if TGPS is 0, the device will never be able to > > support > > ALUA. The kernel also looks at the TPGS bits and won't try ALUA if > > they > > are unset. Once the device is configured and actual ALUA RTPG/STPG > > calls are performed, they may fail for a variety of temporary > > reasons - > > I wanted to avoid resetting the prio algorithm to "const" for such > > cases. That's my understanding, correct me if I'm wrong. > > Devices that were not correctly supporing ALUA returned > 0 for > get_target_port_group_support, so detect_alua actually does all the > work > necessary to verify that it can get a priority. Without doing this, > multiple deviecs that didn't support ALUA were being detected as > supporting ALUA. So, detect_alua() tests TPGS *and* tries and actual alua call, and sets pp->tpgs to anything other than TPGS_NONE only if the latter is successful. That's fine. My patch was looking at pp->tpgs, so it was implicitly using this logic of detect_alua(). But does that guarantee that future alua->getprio() calls will never fail at some later point in time? Maybe I misunderstood your original proposition. What I'm saying is that resetting the prio algorithm from "alua" to "const" because of an error code in get_prio() is wrong, because that error code may be transient. If we give "hardware_handler" config options preference over ALUA autodetection, and thus enforce hwhandler "1 alua" on such devices that have no ALUA support, domap() is guaranteed to fail, because the kernel refuses to set up a map with a given hwhandler if any device doesn't support that handler. > By using retain_attached_hwhandler at all, we are implicitly > requiring > the scsi_dh_alua module to be loaded before devices with > indeterminate > configurations are discovered for them to work correctly. right? For > instance, commit 715c48d93dd00930534ce6a55d0e3705466df5d6 did this > for > netapp devices, and that was in 2013. I don't see how this is > different. You're right, we are "implicitly requiring" this sort-of, but we have no code that enforces the early loading of the device handlers. We should be shipping a modules-load.d file, or a modprobe.d softdep, or something similar that would enforce this setting if we _really_ depend on it. "Implicit requirements" are bad. We should either make the requirement explicit, or not hard-depend on it. So far I was thinking the latter. After all, SCSI device-handler support is configurable in the kernel. Regards, Martin -- Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel