Re: multipath prio_callout broke from 5.2 to 5.3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 13, 2009 at 05:00:05AM -0400, John A. Sullivan III wrote:
> Thank you.  I'll detail our script and the logic behind it in a separate
> email in case it is helpful to others.
> 
> In the meantime, we have a critical problem where the script which was
> working perfectly in 5.2 is now broken in 5.3.  Is there any way to
> deconfuse the 5.3 multipathd or any other immediate solution? - John

What christophe said is correct. In RHEL 5.3, multipath started copying
all of the necessary callouts into it own private namespace. It scans
through your config file, and pulls out all the binaries.  However,
there are two problems that are affecting you.  First, it only pulls the
command, "/bin/bash" in you case, not the arguments, which for
you include a script to run.  Second, it's private namespace only
consists of /sbin, /bin, /tmp, a couple of virtual filesystems, like
/proc and /sys (well, actually there are a couple of others, like /etc,
that multipath needs to start up, but you shouldn't rely on them being
there all the time, since you can lose access to them if the device
they're on goes down)

There are two ways to deal with this.  First is to rewrite the
prioritizer in C.  I realize that this is a pain, but it will be
necessary to run on RHEL6 and new fedora machines, which use upstream's
prio functions instead of callout binaries.

The second, quicker way is to move your callout to /sbin and add a dummy
device section to make sure it gets picked up.

devices {
...
	device {
		vendor       "dummy"
		product      "dummy"
		prio_callout "/sbin/mpath_prio_ssi"
	}
}

This will cause multipathd to copy your script into the private
namespace, and everything should work, with one exception.

bash is not a statically linked executable.  It links to libraries,
and multipathd doesn't make its own copies of them.  Under normal
operation this will work (/lib is also in multipathd's
private namespace). However, if you lose access to /lib, bash won't
work, and multipathd won't be able to restore access to your devices.
If you aren't planning on multipathing / or /lib you might choose to
ignore this (The exact same problem exists in 5.2).

I don't believe that there is a statically linked shell in RHEL 5.
This is another reason to convert your callout to a C program. Or
you can recompile bash with static linking.

-Ben

> On Sun, 2009-04-12 at 09:13 +0200, christophe.varoqui@xxxxxxx wrote:
> > John,
> > 
> > Redhat-shiped multipathd populates upon start-up a private mem-backed filesystem with binaries it needs.
> > Prio callouts in the form "$SHELL /path/to/myscript" seem to confuse the logic.
> > If you prio callout is of general interest, may be we can port it upstream (as a shared object).
> > If you are interested, please describe and post the source.
> > 
> > Regards,
> > cvaroqui
> > 
> > ----- Mail Original -----
> > De: "John A. Sullivan III" <jsullivan@xxxxxxxxxxxxxxxxxxx>
> > À: "device-mapper development" <dm-devel@xxxxxxxxxx>
> > Envoyé: Dimanche 12 Avril 2009 06h07:55 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne
> > Objet: Re:  multipath prio_callout broke from 5.2 to 5.3
> > 
> > On Sat, 2009-04-11 at 23:54 -0400, John A. Sullivan III wrote:
> > > Hello, all.  We are facing a serious problem with dm-multipath after our
> > > upgrade.  We use a bash script to set priorities for failover.  We
> > > understand multipathd cannot use a bash script directly so it has been
> > > carefully crafted to use only internal commands and is loaded as:
> > > 
> > > prio_callout            "/bin/bash /usr/local/sbin/mpath_prio_ssi %n"
> > > 
> > > This has been working perfectly fine.  We upgraded our test lab to
> > > CentOS 5.3, device-mapper-multipath.x86_64 0.4.7-23.el5_3.2, kernel
> > > 2.6.29.1 (the 2.6.18 default causes a kernel panic with iSCSI).
> > > Suddenly, it is breaking.  /var/log/messages is filled with:
> > > 
> > > Apr 11 23:17:15 kvm01 multipathd: cannot open /sbin/dasd_id : No such file or directory
> > > Apr 11 23:17:15 kvm01 multipathd: cannot open /sbin/gnbd_import : No such file or directory
> > > Apr 11 23:17:15 kvm01 multipathd: [copy.c] cannot open /sbin/dasd_id
> > > Apr 11 23:17:15 kvm01 multipathd: cannot copy /sbin/dasd_id in ramfs : No such file or directory
> > > Apr 11 23:17:15 kvm01 multipathd: [copy.c] cannot open /sbin/gnbd_import
> > > Apr 11 23:17:15 kvm01 multipathd: cannot copy /sbin/gnbd_import in ramfs : No such file or directory
> > > Apr 11 23:17:15 kvm01 multipathd: /bin/bash exitted with 127
> > > Apr 11 23:17:15 kvm01 multipathd: error calling out /bin/bash /usr/local/sbin/mpath_prio_ssi sdc
> > > Apr 11 23:17:15 kvm01 multipathd: /bin/bash exitted with 127
> > > Apr 11 23:17:15 kvm01 multipathd: error calling out /bin/bash /usr/local/sbin/mpath_prio_ssi sdd
> > > Apr 11 23:17:15 kvm01 multipathd: /bin/bash exitted with 127
> > > Apr 11 23:17:15 kvm01 multipathd: error calling out /bin/bash /usr/local/sbin/mpath_prio_ssi sde
> > > Apr 11 23:17:15 kvm01 multipathd: /bin/bash exitted with 127
> > > 
> > > The first several messages are expected but not the latter ones.  If we
> > > run the call from the command line, e.g.,
> > > "/bin/bash /usr/local/sbin/mpath_prio_ssi sdc" it works perfectly fine.
> > > 
> > > What has changed and how do we fix it? I'll include a sample script
> > > below.  The script is dynamically created just before launching
> > > multipathd:
> > > 
> > > #!/bin/bash
> > > # if not passed any device name, return a priority of 0
> > > if [ -z "${1}" ];then
> > >         echo 0
> > >         exit
> > > fi
> > > 
> > > DEVS="lrwxrwxrwx 1 root root  9 Apr 11 23:13 ip-172.x.x.30:3260-iscsi-iqn.1986-03.com.sun:02:17f534f0-74af-e61b-a716-b8ac8e219dac-lun-0 -> ../../sdj
> > > lrwxrwxrwx 1 root root  9 Apr 11 23:13 ip-172.x.x.30:3260-iscsi-iqn.1986-03.com.sun:02:47c5e722-10d3-66c7-a952-d3d79732da9c-lun-0 -> ../../sdr
> > > lrwxrwxrwx 1 root root  9 Apr 11 23:13 ip-172.x.x.30:3260-iscsi-iqn.1986-03.com.sun:02:520e823d-342c-6668-9477-fad130b148d7-lun-0 -> ../../sdn"
> > > 
> > > LIST="172.x.x.78:3260-iscsi-iqn.1986-03.com.sun:02:adb0cf37-9a23-6fc9-922a-eb4540bee1c9->99
> > > 172.x.x.46:3260-iscsi-iqn.1986-03.com.sun:02:adb0cf37-9a23-6fc9-922a-eb4540bee1c9->49
> > > 172.x.x.62:3260-iscsi-iqn.1986-03.com.sun:02:adb0cf37-9a23-6fc9-922a-eb4540bee1c9->24"
> > > 
> > > FOUND=0
> > > IFSORIG=${IFS}
> > > IFS=$'\n'
> > > for LINE in ${DEVS}
> > > do
> > >         ENTRY=${LINE%/${1}}
> > >         if [ ${#ENTRY} -ne ${#LINE} ];then # We found the line
> > >                 FOUND=1
> > >                 break
> > >         fi
> > > done
> > > if [ "$FOUND" = "0" ];then  # This is not an iSCSI device
> > >         echo 0
> > >         exit
> > > fi
> > > DEV="${ENTRY##* ip-}"
> > > #DEV="${DEV%% ->*}" # the pattern changed in CentOS 5.3
> > > #DEV="$(echo ${DEV} | sed 's/-lun-[0-9][0-9]* ->.*//')"
> > > DEV="${DEV%%-lun-[0-9]* ->*}"
> > > PRIORITY=0
> > > for LINE in ${LIST}
> > > do
> > >         DISK=${LINE%->*}
> > >         if [ "${DEV}" = "${DISK}" ];then
> > >                 PRIORITY="${LINE##*->}"
> > >                 break
> > >         fi
> > > done
> > > echo ${PRIORITY}
> > > 
> > > I did notice the semantics of /dev/disk/by-path changed and we adapted
> > > to that.  We were planning to move this to production on Thursday so
> > > this has thrown a huge spanner in the works.  Any help would be greatly
> > > appreciated.  Thanks - John
> > 
> > I've just notice that my console is filled with:
> > 
> > /bin/bash: /usr/local/sbin/mpath_prio_ssi: No such file or directory
> > 
> > but it is indeed there and owned by root and executable.  I've quintuple
> > checked! Has multipathd been changed so it cannot read anything from
> > disk even if invoked from within bash? Thanks - John
> -- 
> John A. Sullivan III
> Open Source Development Corporation
> +1 207-985-7880
> jsullivan@xxxxxxxxxxxxxxxxxxx
> 
> http://www.spiritualoutreach.com
> Making Christianity intelligible to secular society
> 
> 
> --
> dm-devel mailing list
> dm-devel@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/dm-devel

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel

[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux