Hello Martin, Thanks for your reply. Waiting for your patches I'd like to test it. And also, I'd like to tell more detail about this issue. > Is /dev/sdbk indeed a healthy path in this situation, or not? > Please run "multipathd show devices", too. Yes, /dev/sdbk can read and write, the condition is well, it fully recovered. Just because it not in pathvec and check_path have no chance to found it up and reinstate it in dm, and dm map have this path but queue_if_no_path still have, dm state is down, dm io blocked. >I wonder if your logs provide some more evidence how this situation >came to pass. I suspect that either a) uevents got lost, or that b) >multipathd failed to (re-)add the path while handling an uevent. It'd >be interesting to find out what it actually was. Firstly I also suspect like you, but actually there is not any uevent to remove the path in my debugging version with more log. But I found that after reconfig processed issue might appears, detect more, found that when storage network down block device not accessed reconfig will make paths disappear form multipathd because its status is bad. I think this might be a reason. Add more log for pathvec removing path, but not found other glue. >I believe that your problem would have been solved simply by removing >the "!is_daemon" condition above. I'd like to know if that's actually >the case, because your add_missing_path() function does basically the >same thing (plus calling pathinfo()). I have tried this but not work. Only need pathinfo and store pathinfo into pathvec can work, because check_path need full path info to finish check old code path only have pp->dev and pp->dev_t, check_path cannot use this to check path status. Dm io blocked issue not happen again, because this patch can repair this blocked very fast. Also dm io blocked issue actually is a low probability event. >However, the "!is_daemon" test is there for a reason (see b96dead > ("[multipathd] remove the retry login in uev_remove_path()"), and >that's why your patch isn't correct as-is. -----original----- 发件人: Martin Wilck [mailto:mwilck@xxxxxxxx] 发送时间: 2020年7月7日 17:43 收件人: wuchongyun (Cloud) <wu.chongyun@xxxxxxx>; Benjamin Marzinski <bmarzins@xxxxxxxxxx>; dm-devel@xxxxxxxxxx 抄送: guozhonghua (Cloud) <guozhonghua@xxxxxxx>; wangyong (Cloud) <wang.yongD@xxxxxxx>; changlimin (Cloud) <changlimin@xxxxxxx>; zhangguanghui (Cloud) <zhang.guanghui@xxxxxxx>; chengchiwen (Cloud) <chengchiwen@xxxxxxx> 主题: Re: [PATCH]libmultipath/dmparser: add missing path with good status when sync state with dm kernel Hello Chongyun, On Tue, 2020-07-07 at 03:08 +0000, Chongyun Wu wrote: > Hi Martin and Ben, > > Cloud you help to view below patch, thanks. > > > From b2786c81a78bf3868f300fd3177e852e718e7790 Mon Sep 17 00:00:00 > > 2001 > From: Chongyun Wu <wu.chongyun@xxxxxxx> > Date: Mon, 6 Jul 2020 11:22:21 +0800 > Subject: [PATCH] libmultipath/dmparser: add missing path with good > status when sync state with dm kernel > Nack, sorry. I know we have an issue in this area, but I would like to handle it differently. First of all, I want to get rid of disassemble_map() making modifications to the pathvec. The fact that disassemble_map() currently does this is an ugly layer violation IMO, and I don't like the idea of adding more of this on top. I'm currently preparing a patch set that addresses this (among other things). It will also address the issue of missing paths in pathvec, and I'd be very glad if you could give it a try in your test environment. > Add path back into deamon vecs->pathvecs when found path missing in > multipathd which can fix dm io blocked issue. > > Test environment: > several hosts; > each host have 100+ multipath, each multipath have 1 to 4 paths. > run up/down storage network loop script for days. > > Issue: > After several hours stop script, found some hosts have dm io blocked > issue: > slave block device access well, but its dm device blocked. > 36c0bfc0100a8d4a228214da50000003c dm-20 HUAWEI,XSG1 > size=350G features='1 queue_if_no_path' hwhandler='0' wp=rw > `-+- policy='round-robin 0' prio=1 status=enabled > |- 1:0:0:20 sdbk 67:224 failed ready running > multipathd show paths cannot found sdbk! Is /dev/sdbk indeed a healthy path in this situation, or not? Please run "multipathd show devices", too. I wonder if your logs provide some more evidence how this situation came to pass. I suspect that either a) uevents got lost, or that b) multipathd failed to (re-)add the path while handling an uevent. It'd be interesting to find out what it actually was. More notes below. > Test result: > With this patch, run script several days, io blocked issue > not found again. > > This patch can fix this issue: when found path only missing in > multipathd but still in dm, check the missing path status if ok > try to add it back, and checker have chance to reinstate this > path and the dm io blocked issue will disappear. > > Signed-off-by: Chongyun Wu <wu.chongyun@xxxxxxx> > --- > libmultipath/dmparser.c | 31 +++++++++++++++++++++++++++++++ > libmultipath/dmparser.h | 1 + > 2 files changed, 32 insertions(+) > > diff --git a/libmultipath/dmparser.c b/libmultipath/dmparser.c > index b856a07f..2f90b17c 100644 > --- a/libmultipath/dmparser.c > +++ b/libmultipath/dmparser.c > @@ -317,6 +317,12 @@ int disassemble_map(vector pathvec, char > *params, struct multipath *mpp, > /* Only call this in multipath client > mode */ > if (!is_daemon && store_path(pathvec, > pp)) > goto out1; I believe that your problem would have been solved simply by removing the "!is_daemon" condition above. I'd like to know if that's actually the case, because your add_missing_path() function does basically the same thing (plus calling pathinfo()). However, the "!is_daemon" test is there for a reason (see b96dead ("[multipathd] remove the retry login in uev_remove_path()"), and that's why your patch isn't correct as-is. Regards, Martin > + > + /* Try to add good status path back to > avoid > + * dm io blocked issue in special > condition. > + */ > + if(add_missing_path(pathvec, devname)) > + condlog(2, "Try to add missing > path %s failed", devname); > } else { > if (!strlen(pp->wwid) && > strlen(mpp->wwid)) > @@ -569,3 +575,28 @@ int disassemble_status(char *params, struct > multipath *mpp) > } > return 0; > } > + > +/* Add missing good status path back to multipathd */ > +int add_missing_path(vector pathvec, char *missing_dev) > +{ > + struct udev_device *udevice; > + struct config *conf; > + int ret = 0; > + > + condlog(2, "Cant't found path %s try to add it back if its > state is up.", > + missing_dev); > + > + udevice = udev_device_new_from_subsystem_sysname(udev, "block", > missing_dev); > + if (!udevice) { > + condlog(0, "%s: can't find path form udev", > missing_dev); > + return 1; > + } > + conf = get_multipath_config(); > + pthread_cleanup_push(put_multipath_config, conf); > + ret = store_pathinfo(pathvec, conf, > + udevice, DI_ALL | > DI_BLACKLIST, NULL); > + pthread_cleanup_pop(1); > + udev_device_unref(udevice); > + > + return ret; > +} > diff --git a/libmultipath/dmparser.h b/libmultipath/dmparser.h > index e1badb0b..515ca900 100644 > --- a/libmultipath/dmparser.h > +++ b/libmultipath/dmparser.h > @@ -1,3 +1,4 @@ > int assemble_map (struct multipath *, char *, int); > int disassemble_map (vector, char *, struct multipath *, int); > int disassemble_status (char *, struct multipath *); > +int add_missing_path(vector pathvec, char *missing_dev); -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel