Jiaojianbing, On Mon, 2018-07-02 at 01:06 +0000, Jiaojianbing wrote: > > [I've added Hannes, Ben and Douglas to the recepient list to fill > > in knowledge > > from the past that I may lack]. > > > > tl;dr summary: We've got 3 issues: > > > > 1) Why does multipath, in reinstate_paths(), try to reinstate > > paths which are > > known to be down? > > 2) rescan-scsi-bus.sh can call "multipath" even if "-m" switch is > > not used (that > > looks like a bug to me). > > 3) In Jiaojianbing's environment, dead paths that have been > > removed on the > > target and were already marked "offline" may appear as "running" > > after rescan-scsi-bus.sh invocation. > > > > Furthermore, > > 4) perhaps rescan-scsi-bus.sh should replace suboptimal > > "multipath" > > calls with multipathd cli commands (or better even, we multipath- > > tools people > > should eventually finish the "delegate to multipathd" work). > > It's my negligence, command "multipath" is not the one in rescan- > scsi-bus.sh, but another one called every five minutes by process > "test.sh". > It means there are two processes, one is rescan-scsi-bus.sh, another > is test.sh which call multipath every five minutes. Please describe your setup bottom-up. You have two scripts running periodically, one calling "rescan-scsi-bus.sh" and one "multipath", and they are (can be) running at the same time? How frequently are they running? Be aware that "multipath" is not a monitoring command, it basically causes a reconfiguration. It's not recommended to run it periodically. Of course running "multipath" shouldn't cause a system hang, but in your case I still think the root problem is that devices that can't respond to IO are seen in "running" state by the kernel. If that happens, other processes are allowed, actually supposed, to do probing on these devices. But it's hard to say more without knowing what exactly is going on. Also, please consider updating to more recent version of the tools. dm- devel a mailing list for discussing upstream issues, and your versions of both multipath and sg3_utils are rather ancient. I guess you're using some older distribution, in which case you may want to engage with your distro's support team. I don't think well make much progress without detailed logs of both the kernel (please activate scsi logging with MLCOMPLETE=1|ERROR=4|SCAN_BUS=4, run rescan_scsi_bus.sh with -d switch, and set multipath verbosity of both multipathd and "multipath" command to 3 at least, and put results on a pastebin somewhere, and provide us with links. > In the scene, rescan-scsi-bus.sh will consume more larger time than > the scene without calling "test.sh". The reason is that all "systemd- > udevd" process > are in D state who send io to device mapper device, such as dm-105. If that's the case, please also run "udevadm -l debug" and provide udev logs. We need to know which udev commands are hanging. Martin -- Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel