Re: multipath-tools 0.7.4 failure to remove device

Martin Wilck <mwilck@xxxxxxxx> · Fri, 12 Jan 2018 21:35:39 +0100

On Fri, 2018-01-12 at 09:38 +0100, Julian Andres Klode wrote:
> 
> and then we get I/O error on the device and it's rendered unusable.
> It's
> also crashing in uev_pathfail_check() occassionally because
> find_path_by_devt()
> returns NULL, so I applied the following patch to at least continue,
> but that's
> obviously wrong - We get an udev event for a device which does not
> exist in /dev
> (but it should)?

Adding Guan, as the pathfail check is from his code.

> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1090,6 +1090,11 @@ uev_pathfail_check(struct uevent *uev, s
>  	lock(&vecs->lock);
>  	pthread_testcancel();
>  	pp = find_path_by_devt(vecs->pathvec, devt);
> +	if (!pp) {
> +		condlog(3, "%s: Cannot find path by dm path %s",
> uev->kernel, devt);
> +		FREE(devt);
> +		goto out;
> +	}
>  	r = io_err_stat_handle_pathfail(pp);
>  	lock_cleanup_pop(vecs->lock);

You need to cleanup the lock in the error path. I'd pefer checking
for a NULL path argument in io_err_stat_handle_pathfail(). See
attachment.

I'm assuming that you are not using the "marginal path" logic. In
general I don't like the fact that PATH_FAILED events are handled at
all in multipathd if this logic is inactive; that code path is only
needed for this purpose. But that's just a side note.

> Jan 12 09:17:52 autopkgtest kernel: device-mapper: multipath: Failing
> path 8:16.
> > Jan 12 09:17:52 autopkgtest kernel: sd 3:0:0:1: [sdb] Synchronizing
> SCSI cache
> > Jan 12 09:17:52 autopkgtest multipath[6909]: 8:16: cannot find
> block device
> Jan 12 09:17:52 autopkgtest multipath[6909]: 8:16: Empty device
name
> Jan 12 09:17:52 autopkgtest multipath[6909]: 8:16: Empty device
name
> > Jan 12 09:17:52 autopkgtest multipath[6909]: get_udev_device: >
> failed to look up 8:16 with type 1
> > Jan 12 09:17:52 autopkgtest multipath[6909]: dm-0: usable paths
> found
> > Jan 12 09:17:53 autopkgtest iscsid[649]: Connection2:0 to [target:
> iqn.2016-11.foo.com:target.iscsi, portal: 127.0.0.1,3260] through
> [iface: default] is shutdown.

> > We can see that it correctly removed the first device (sda) -
> except well, it seems to try
> >again and fail with the part where it would have crashed. But when
> it tries to lookup the
> second one it fails.

> > Given that this works in 0.6.4, I think it's a bug that appeared
> later on,
> > but I can't really pin point the source of it.

Well, it may be because of the locking being broken by your patch.
If you look at the journal you sent, multipathd never prints a single
message after the removal of sda, until it says

Jan 12 09:18:37 autopkgtest multipathd[1980]: exit (signal)

That makes me think it hangs somehow, which could well be explained by
the lock not being released. Please retry with the attached patch.

We are seeing the *multipath* messages ([6069]) which are printed from
multipath during udev rule processing, because the map still holds
references to the deleted path. 

Regards,
Martin

-- 
Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
commit c4d48c633b0825941024a34acf2304a6f5a2d17d (HEAD -> upstream)
Author: Martin Wilck <mwilck@xxxxxxxx>
Date:   Fri Jan 12 21:21:49 2018 +0100

    libmultipath: deal with NULL path in pathfail handler
    
    This avoids a crash for paths which are already deleted.
    
    Reported-by: Julian Andres Klode <julian.klode@xxxxxxxxxxxxx>

diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
index 75a6df67c207..d2d2276a523e 100644
--- a/libmultipath/io_err_stat.c
+++ b/libmultipath/io_err_stat.c
@@ -315,6 +315,10 @@ int io_err_stat_handle_pathfail(struct path *path)
 	struct timespec curr_time;
 	int res;
 
+	if (path == NULL) {
+		io_err_stat_log(1, "%s: called with empty path", __func__);
+		return 1;
+	}
 	if (path->io_err_disable_reinstate) {
 		io_err_stat_log(3, "%s: reinstate is already disabled",
 				path->dev);
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel