Re: [PATCH 4/6] libmultipath: fix suspended devs from failed reloads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 11, 2017 at 10:26:52PM +0200, Martin Wilck wrote:
> On Tue, 2017-05-09 at 11:57 -0500, Benjamin Marzinski wrote:
> > When multipath reloads a device, it can either fail while loading the
> > new table or while resuming the device. If it fails while resuming
> > the
> > device, the device can get stuck in the suspended state.  To fix
> > this,
> > multipath needs to resume the device again so that it can continue
> > using
> > the old table.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx>
> > ---
> >  libmultipath/devmapper.c | 19 ++++++++++++++++++-
> >  libmultipath/devmapper.h |  1 +
> >  2 files changed, 19 insertions(+), 1 deletion(-)
> > 
> > diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
> > index 2c4a13a..69b634b 100644
> > --- a/libmultipath/devmapper.c
> > +++ b/libmultipath/devmapper.c
> > @@ -396,7 +396,13 @@ int dm_addmap_reload(struct multipath *mpp, char
> > *params, int flush)
> >  	if (r)
> >  		r = dm_simplecmd(DM_DEVICE_RESUME, mpp->alias,
> > !flush,
> >  				 1, udev_flags, 0);
> > -	return r;
> > +	if (r)
> > +		return r;
> > +
> > +	if (dm_is_suspended(mpp->alias))
> > +		dm_simplecmd(DM_DEVICE_RESUME, mpp->alias, !flush,
> > 1,
> > +			     udev_flags, 0);
> > +	return 0;
> >  }
> 
> Why would the second DM_DEVICE_RESUME call succeed if the first one
> failed?

Because if the first resume fails, device-mapper rolls back to the
original table.

The specific way that someone found this was by running a

multipathd resize

when only some of the paths had been rescaned and noticed the new,
larger, size. Since the first path had changed size (that's all the
multipath looks at to find the new size) the multipath device tried to
reload larger. However, when it tried to resume with the larger size, it
failed since some of the devices were too small.  The second resume lets
it try again with the original table.

Of course, that isn't the only way that this could fail, and it's nice
to be able to gracefully continue using the old table instead of just
being suspended.

-Ben

> 
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> 
> --
> dm-devel mailing list
> dm-devel@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/dm-devel

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel




[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux