Re: 答复: why command of multipath send reinstate message to all dm's paths

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2018-06-28 09:38 AM, Martin Wilck wrote:
[I've added Hannes, Ben and Douglas to the recepient list to fill in
knowledge from the past that I may lack].

tl;dr summary: We've got 3 issues:

  1) Why does multipath, in reinstate_paths(), try to reinstate paths
which are known to be down?
  2) rescan-scsi-bus.sh can call "multipath" even if "-m" switch is not
used (that looks like a bug to me).
  3) In Jiaojianbing's environment, dead paths that have been removed on
the target and were already marked "offline" may appear as "running"
after rescan-scsi-bus.sh invocation.

Furthermore,
  4) perhaps rescan-scsi-bus.sh should replace suboptimal "multipath"
calls with multipathd cli commands (or better even, we multipath-tools
people should eventually finish the "delegate to multipathd" work).


On Thu, 2018-06-28 at 06:35 +0000, Jiaojianbing wrote:
Dear Christophe,
when dm-105 is in one state of below, paths of dm-105 will change
to
active if we run command of multipath.

Could you be more specific please? What multipath command did you
run?
Which version of multipath-tools are you running?

command is "multipath", which can run in shell as below:
#multipath

... and if I understand correctly, originally the problem occured while
running rescan_scsi_bus.sh. Please also state the version of sg3_utils
you are using.


And the version:  multipath-tools v0.4.9 (05/33, 2016)

Well, that's ancient. But latest multipath-tools still has the same
code.


  I check code of multipath, it sends messge "reinstate_path
pathname"
to kernel in routine reinstate_paths when status of pathgroup =
"PGSTATE_ENABLED/PGSTATE_UNDEF" and path's state =
"PSTATE_FAILED".
why command of multipath do above action to all dm devices?
actually,
parts of these paths are already offline or failed which can't be
recovered. Maybe we can check these devices's status by sending
io to
these sd device at first. according to return of io, multipath
send
reinstate to running devices and do nothing to failed devices?

I see this code in reinstate_paths():

		vector_foreach_slot (pgp->paths, pp, j) {
			if (pp->state != PATH_UP &&
			    (pgp->status == PGSTATE_DISABLED ||
			     pgp->status == PGSTATE_ACTIVE))
				continue;

			if (pp->dmstate == PSTATE_FAILED) {
				if (dm_reinstate_path(mpp->alias, pp-
dev_t))
					condlog(0, "%s: error
reinstating",
						pp->dev);
			}
		}

The reinstate command is only sent for paths which are either in
PATH_UP
state, or belong to an PGSTATE_ENABLED path group. I admit I'm
unsure why
all we try to reinstate paths that we know are down. This is 13-
year-old code.

Interstingly, the state of your paths changes from "faulty offline"
to "ready
running". So it appears that these paths are actually _not_
down  Just the
reinstate seems has failed on them.

multipathd -v3 logs and possibly kernel logs would be helpful to
understand
what was going on in that situation.

     Sorry, maybe my two multipath status sample confused you. They
are just sample. Actually, I run command "rescan-scsi-bus" to
clear all mapped scsi devices by iscsid in host when all of LUNS in
remote IPSAN are removed.
In process of running rescan-scsi-bus, if command "multipath" is
running, the status of dm's path will change from
failed to active in some moment as below. If IO is sent to dm-105,
the process who sends io will be in D state.
# multipath -ll
36d0d04b100b8cba665a187f0000000f9 dm-105 HUAWEI  ,XSG1
size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
   `- 18:0:0:101 sdku 67:288  active faulty running

The strange part here is that the device is considered "running". This
is the state of the kernel device. If the LUNs are actually _removed_
as you say, the device should be gone, or at least marked "offline".

Apparently the SCSI bus SCAN via iSCSI still showed the LUN in a
workable state. For multipath this translates to PATH_UP. Thus even if
the above code didn't have the (pgp->status == PGSTATE_DISABLED || pgp-
status == PGSTATE_ACTIVE) clause, the reinstate would have been
attempted by multipath. This looks like a low-level problem in your
SCSI or iSCSI layer to me.

This looks like the actual problem to me. multipath aside, if the path
appears to be "running", any Linux process could try to send IO down to
it and be stuck, as you say.


   I want to know whether command "multipath" is reasonable in
reinstate_paths().


And maybe we should not call "multipath" in process of running
rescan-scsi-bus ?

Normally rescan-scsi-bus.sh should call "multipath" only if the
"-m|--multipath" switch was used. I quickly scanned through the code
and didn't find a call to "multipath" (with no options) which wasn't
guarded by the [ -n "$mp_enable" ] condition. (FTR: there is a call to
"multipath -f" from main->flushmpaths if "-f|--flush" is set).

Again, please double-check your version of sg3_utils, and perhaps run
"bash -x rescan-scsi-bus.sh" to figure out the call chain which runs
the "multipath" command.

Thanks,
Martin


Hi,
My upstream version of rescan-scsi-bus.sh is attached. The last change was
the --ignore-rev option from Gris Ge <fge@xxxxxxxxxx>. He has sent several
cleanups in the last year, usually via Hannes' github site for sg3_utils.

My ChangeLog entry to that script (since sg3_utils 1.42) is:

  - rescan-scsi-bus.sh: harden code
    - fixes from Suse; bump version
    - bump version to 20180615
    - add to install list in Makefile, hope it does
      not clash with other package providing it
    - add --ignore-rev to ignore revision change

If there are no further changes it will be like that in sg3_utils-1.43
revision 780.

Doug Gilbert

Attachment: rescan-scsi-bus.sh
Description: application/shellscript

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel

[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux