> [I've added Hannes, Ben and Douglas to the recepient list to fill in knowledge > from the past that I may lack]. > > tl;dr summary: We've got 3 issues: > > 1) Why does multipath, in reinstate_paths(), try to reinstate paths which are > known to be down? > 2) rescan-scsi-bus.sh can call "multipath" even if "-m" switch is not used (that > looks like a bug to me). > 3) In Jiaojianbing's environment, dead paths that have been removed on the > target and were already marked "offline" may appear as "running" > after rescan-scsi-bus.sh invocation. > > Furthermore, > 4) perhaps rescan-scsi-bus.sh should replace suboptimal "multipath" > calls with multipathd cli commands (or better even, we multipath-tools people > should eventually finish the "delegate to multipathd" work). It's my negligence, command "multipath" is not the one in rescan-scsi-bus.sh, but another one called every five minutes by process "test.sh". It means there are two processes, one is rescan-scsi-bus.sh, another is test.sh which call multipath every five minutes. In the scene, rescan-scsi-bus.sh will consume more larger time than the scene without calling "test.sh". The reason is that all "systemd-udevd" process are in D state who send io to device mapper device, such as dm-105. so it can be 2 issues: 1) Why does multipath, in reinstate_paths(), try to reinstate paths which are known to be down? 2) when run script "rescan-scsi-bus.sh", another process call command "multipath" may make mistake. > On Thu, 2018-06-28 at 06:35 +0000, Jiaojianbing wrote: > > > > Dear Christophe, > > > > when dm-105 is in one state of below, paths of dm-105 will change > > > > to active if we run command of multipath. > > > > > > Could you be more specific please? What multipath command did you > > > run? > > > Which version of multipath-tools are you running? > > > > command is "multipath", which can run in shell as below: > > #multipath > > ... and if I understand correctly, originally the problem occured while running > rescan_scsi_bus.sh. Please also state the version of sg3_utils you are using. And the version of sg3_utils: sg3_utils-libs-1.37-14.x86_64; According to above description, the problem may be made by adding another process calling command "multipath" in period. > > > > And the version: multipath-tools v0.4.9 (05/33, 2016) > > Well, that's ancient. But latest multipath-tools still has the same code. > > > > > > > I check code of multipath, it sends messge "reinstate_path > > > > pathname" > > > > to kernel in routine reinstate_paths when status of pathgroup = > > > > "PGSTATE_ENABLED/PGSTATE_UNDEF" and path's state = > > > > "PSTATE_FAILED". > > > > why command of multipath do above action to all dm devices? > > > > actually, > > > > parts of these paths are already offline or failed which can't be > > > > recovered. Maybe we can check these devices's status by sending io > > > > to these sd device at first. according to return of io, multipath > > > > send reinstate to running devices and do nothing to failed > > > > devices? > > > > > > I see this code in reinstate_paths(): > > > > > > vector_foreach_slot (pgp->paths, pp, j) { > > > if (pp->state != PATH_UP && > > > (pgp->status == PGSTATE_DISABLED || > > > pgp->status == PGSTATE_ACTIVE)) > > > continue; > > > > > > if (pp->dmstate == PSTATE_FAILED) { > > > if (dm_reinstate_path(mpp->alias, pp- > > > >dev_t)) > > > condlog(0, "%s: error > > > reinstating", > > > pp->dev); > > > } > > > } > > > > > > The reinstate command is only sent for paths which are either in > > > PATH_UP state, or belong to an PGSTATE_ENABLED path group. I admit > > > I'm unsure why all we try to reinstate paths that we know are down. > > > This is 13- year-old code. > > > > > > Interstingly, the state of your paths changes from "faulty offline" > > > to "ready > > > running". So it appears that these paths are actually _not_ down > > > Just the reinstate seems has failed on them. > > > > > > multipathd -v3 logs and possibly kernel logs would be helpful to > > > understand what was going on in that situation. > > > > Sorry, maybe my two multipath status sample confused you. They are > > just sample. Actually, I run command "rescan-scsi-bus" to clear all > > mapped scsi devices by iscsid in host when all of LUNS in remote IPSAN > > are removed. > > In process of running rescan-scsi-bus, if command "multipath" is > > running, the status of dm's path will change from failed to active in > > some moment as below. If IO is sent to dm-105, the process who sends > > io will be in D state. > > # multipath -ll > > 36d0d04b100b8cba665a187f0000000f9 dm-105 HUAWEI ,XSG1 size=1.0G > > features='1 queue_if_no_path' hwhandler='0' wp=rw > > `-+- policy='service-time 0' prio=1 status=active > > `- 18:0:0:101 sdku 67:288 active faulty running > > The strange part here is that the device is considered "running". This is the > state of the kernel device. If the LUNs are actually _removed_ as you say, the > device should be gone, or at least marked "offline". > > Apparently the SCSI bus SCAN via iSCSI still showed the LUN in a workable state. > For multipath this translates to PATH_UP. Thus even if the above code didn't > have the (pgp->status == PGSTATE_DISABLED || pgp- > >status == PGSTATE_ACTIVE) clause, the reinstate would have been > attempted by multipath. This looks like a low-level problem in your SCSI or iSCSI > layer to me. > > This looks like the actual problem to me. multipath aside, if the path appears to > be "running", any Linux process could try to send IO down to it and be stuck, as > you say. > > > > I want to know whether command "multipath" is reasonable in > > reinstate_paths(). > > > > > And maybe we should not call "multipath" in process of running > > rescan-scsi-bus ? > > Normally rescan-scsi-bus.sh should call "multipath" only if the "-m|--multipath" > switch was used. I quickly scanned through the code and didn't find a call to > "multipath" (with no options) which wasn't guarded by the [ -n "$mp_enable" ] > condition. (FTR: there is a call to "multipath -f" from main->flushmpaths if > "-f|--flush" is set). > > Again, please double-check your version of sg3_utils, and perhaps run "bash -x > rescan-scsi-bus.sh" to figure out the call chain which runs the "multipath" > command. > > Thanks, > Martin > > > > > > Regards > > > Martin > > > > > > -- > > > Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 SUSE > > > Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB > > > 21284 (AG > > > Nürnberg) > > > > > > -- > Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 SUSE Linux > GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG > Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel