Re: [PATCH 11/15] libmultipath: change failed path prio timeout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2020-01-16 at 20:18 -0600, Benjamin Marzinski wrote:
> multipath will try to get the priority from a PATH_DOWN path, if the
> path doesn't currently have a valid priority. However, if the
> priority
> code needs to contact the device to get the priority, this is likely
> to
> fail for PATH_DOWN paths.  This code dates back to when multipathd
> could
> not easily reload device tables with failed paths, so getting the
> correct priority was important to have a correctly configured device.
> Now multipathd can simply reload the device to move the path to the
> correct pathgroup when the path comes back up.  Since there are a
> number
> of prioritizers that don't require talking to the device, multipath
> shouldn't completely skip attempting to get the priority of these
> paths,
> but it should set a small timeout, so that it isn't hanging in the
> case where it needs to contact a device through a failed path.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx>
> ---
>  libmultipath/discovery.c | 14 ++++++--------
>  libmultipath/prio.c      |  2 +-
>  2 files changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> index 1ab093f4..2c331db8 100644
> --- a/libmultipath/discovery.c
> +++ b/libmultipath/discovery.c
> @@ -1661,11 +1661,10 @@ get_state (struct path * pp, struct config
> *conf, int daemon, int oldstate)
>  }
>  
>  static int
> -get_prio (struct path * pp)
> +get_prio (struct path * pp, int timeout)
>  {
>  	struct prio * p;
>  	struct config *conf;
> -	int checker_timeout;
>  	int old_prio;
>  
>  	if (!pp)
> @@ -1684,11 +1683,8 @@ get_prio (struct path * pp)
>  			return 1;
>  		}
>  	}
> -	conf = get_multipath_config();
> -	checker_timeout = conf->checker_timeout;
> -	put_multipath_config(conf);
>  	old_prio = pp->priority;
> -	pp->priority = prio_getprio(p, pp, checker_timeout);
> +	pp->priority = prio_getprio(p, pp, timeout);
>  	if (pp->priority < 0) {
>  		/* this changes pp->offline, but why not */
>  		int state = path_offline(pp);
> @@ -2095,11 +2091,13 @@ int pathinfo(struct path *pp, struct config
> *conf, int mask)
>  
>  	 /*
>  	  * Retrieve path priority, even for PATH_DOWN paths if it has
> never
> -	  * been successfully obtained before.
> +	  * been successfully obtained before. If path is down don't
> try
> +	  * for too long.
>  	  */
>  	if ((mask & DI_PRIO) && path_state == PATH_UP && strlen(pp-
> >wwid)) {
>  		if (pp->state != PATH_DOWN || pp->priority ==
> PRIO_UNDEF) {
> -			get_prio(pp);
> +			get_prio(pp, (pp->state != PATH_DOWN)?
> +				     (conf->checker_timeout * 1000) :
> 10);
>  		}
>  	}
>  
> diff --git a/libmultipath/prio.c b/libmultipath/prio.c
> index 87de1f97..21c1b092 100644
> --- a/libmultipath/prio.c
> +++ b/libmultipath/prio.c
> @@ -14,7 +14,7 @@ unsigned int get_prio_timeout(unsigned int
> checker_timeout,
>  			      unsigned int default_timeout)
>  {
>  	if (checker_timeout)
> -		return checker_timeout * 1000;
> +		return checker_timeout;
>  	return default_timeout;
>  }
> 

This changes the unit of the first get_prio_timeout() argument from
seconds to milliseconds. While that's a good thing (it was questionable
design to have the same function take several "timeout" arguments in
different units), we should rename the argument there to avoid
confusion (checker_timeout's unit is seconds all around).

Apart from that, ACK.



--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel




[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux