On Dec 20, 2017, at 3:04 PM, Cathy Zhou <cathy.zhou@xxxxxxxxxx
<mailto:cathy.zhou@xxxxxxxxxx>> wrote:
Lee-man,
Thanks for your comments. See inline:
On 12/20/2017 1:47 PM, The Lee-Man wrote:
On Friday, December 8, 2017 at 4:30:36 PM UTC-8, The Lee-Man wrote:
On Thursday, December 7, 2017 at 11:49:53 AM UTC-8,
cathy.zhou@xxxxxxxxxx <mailto:cathy.zhou@xxxxxxxxxx>
<mailto:cathy.zhou@xxxxxxxxxx> wrote:
From: Cathy Zhou <Cathy.Zhou@xxxxxxxxxx
<mailto:Cathy.Zhou@xxxxxxxxxx>>
Co-existence of iscsid and iscsistart can lead to unexpected
iSCSI boot
behavior:
We ran into iSCSI netboot failure when multiple interfaces are
configured
with dhcp. The problem is that iscsistart is not designed to
be working
together with iscsid. When an interface gets the dhcp offer
successfully,
the iscsiroot script is run which starts the iscsistart
service to
establish the iSCSI session. With the existence of iscsid, the
iscsistart
service's attempt to setup its own mgmt ipc fails. Instead,
the request
to login to the iscsi target is handled by the mgmt ipc of
iscsid. After
iscsistart finishes its login attempt, it eventually sends a
stop_event_loop request to stop the mgmt process. As the
result, it
terminates iscsid.
I don't think this behavior is expected. Further, we believe
the restart
of the iscsid service in iscsiroot.sh for multiple interfaces
could also
interfere with each other and race. That could also cause
unexpected
behavior.
Specifically, in our multiple interface iSCSI boot case, below
is what
happened:
a. eth0 successfully got the dhcp offer, the iscsiroot script
was run and
eventually ran "system-run" to start the "oneshot" iscsistart
service.
b. Before step a succeeded, the iscsiroot script was also run
for eth1.
It checks the status of the first iscsistart service instance,
which was
still "activating". So the iscsistart service was restarted
and that
killed the first instance. Note that in this case, the
stop_event_loop
request has not been sent, and iscsid was not terminated. As
the result,
the second iscsistart instance also failed because of the
"existing
session" error, as the half-baked session already existed in
iscsid.
Based on iscsistart(8), iscsistart's primary use is to start
session
used for iSCSI root boot, and is meant to work indpependently
from iscsid.
Making the two work together would need coordination between each
other, which would be complicate and unnecessary.
No, I believe iscsistart was mean to run *instead* of iscsid, not
independently.
That was what I meant: iscsistart is not to run together with iscsid.
Therefore, to fix the issue, we'd either choose to remove the
use of
iscsid and solely use iscsistart or to remove iscsistart and
use iscsid
and iscsiadm to manage the iSCSI sessions.
Our fix chooses the former. We tested the change in our setup.
Signed-off-by: Cathy Zhou <Cathy.Zhou@xxxxxxxxxx
<mailto:Cathy.Zhou@xxxxxxxxxx>>
---
modules.d/95iscsi/cleanup-iscsi.sh | 2 +-
modules.d/95iscsi/iscsiroot.sh | 25
+------------------------
modules.d/95iscsi/module-setup.sh | 30
------------------------------
modules.d/95iscsi/parse-iscsiroot.sh | 10 ----------
4 files changed, 2 insertions(+), 65 deletions(-)
diff --git a/modules.d/95iscsi/cleanup-iscsi.sh
b/modules.d/95iscsi/cleanup-iscsi.sh
index bfc8aefc..e97d65ac 100755
--- a/modules.d/95iscsi/cleanup-iscsi.sh
+++ b/modules.d/95iscsi/cleanup-iscsi.sh
@@ -1,4 +1,4 @@
#!/bin/sh
-[ -z "${DRACUT_SYSTEMD}" ] && [ -e /sys/module/bnx2i ] &&
killproc iscsiuio
+[ -e /sys/module/bnx2i ] && killproc iscsiuio
Why are you removing the DRACUT_SYSTEMD checks?
Because before my change, iscsiuio is started in the form of service
as a dependency of the iscsid service. My change removed the iscsid
service, so iscsiuio needs to be started explicitly for iSCSI offload
for bnx2i driver.
diff --git a/modules.d/95iscsi/iscsiroot.sh
b/modules.d/95iscsi/iscsiroot.sh
index aefd263d..c7f1c474 100755
--- a/modules.d/95iscsi/iscsiroot.sh
+++ b/modules.d/95iscsi/iscsiroot.sh
@@ -36,7 +36,7 @@ iroot=${iroot#:}
# figured out a way how to check whether this is built-in or not
modprobe crc32c 2>/dev/null
-if [ -z "${DRACUT_SYSTEMD}" ] && [ -e /sys/module/bnx2i ] &&
! [ -e /tmp/iscsiuio-started ]; then
+[ -e /sys/module/bnx2i ] && ! [ -e /tmp/iscsiuio-started ]; then
iscsiuio
> /tmp/iscsiuio-started
fi
@@ -117,11 +117,6 @@ handle_netroot()
mkdir -p /etc/iscsi
ln -fs /run/initiatorname.iscsi
/etc/iscsi/initiatorname.iscsi
> /tmp/iscsi_set_initiator
- if [ -n "$DRACUT_SYSTEMD" ]; then
- systemctl try-restart iscsid
- # FIXME: iscsid is not yet ready, when the
service is :-/
- sleep 1
- fi
fi
if [ -z "$iscsi_initiator" ]; then
@@ -138,11 +133,6 @@ handle_netroot()
mkdir -p /etc/iscsi
ln -fs /run/initiatorname.iscsi
/etc/iscsi/initiatorname.iscsi
> /tmp/iscsi_set_initiator
- if [ -n "$DRACUT_SYSTEMD" ]; then
- systemctl try-restart iscsid
- # FIXME: iscsid is not yet ready, when the
service is :-/
- sleep 1
- fi
fi
@@ -163,11 +153,6 @@ handle_netroot()
if ! [ -e /etc/iscsi/initiatorname.iscsi ]; then
mkdir -p /etc/iscsi
ln -fs /run/initiatorname.iscsi
/etc/iscsi/initiatorname.iscsi
- if [ -n "$DRACUT_SYSTEMD" ]; then
- systemctl try-restart iscsid
- # FIXME: iscsid is not yet ready, when the
service is :-/
- sleep 1
- fi
fi
# FIXME $iscsi_protocol??
@@ -234,14 +219,6 @@ if [ "$netif" != "timeout" ] &&
getargbool 1 rd.iscsi.waitnet; then
all_ifaces_setup || exit 0
fi
-if [ "$netif" = "timeout" ] && all_ifaces_setup; then
- # s.th <http://s.th> <http://s.th>. went wrong and the
timeout script hits
- # restart
- systemctl restart iscsid
- # damn iscsid is not ready after unit says it's ready
- sleep 2
-fi
-
if getargbool 0 rd.iscsi.firmware -d -y iscsi_firmware ; then
if [ "$netif" = "timeout" ] || [ "$netif" = "online" ]; then
handle_firmware
diff --git a/modules.d/95iscsi/module-setup.sh
b/modules.d/95iscsi/module-setup.sh
index 04937b5b..1d185a84 100755
--- a/modules.d/95iscsi/module-setup.sh
+++ b/modules.d/95iscsi/module-setup.sh
@@ -199,36 +199,6 @@ install() {
inst "$moddir/iscsiroot.sh" "/sbin/iscsiroot"
if ! dracut_module_included "systemd"; then
inst "$moddir/mount-lun.sh" "/bin/mount-lun.sh"
- else
- inst_multiple -o \
- $systemdsystemunitdir/iscsi.service \
- $systemdsystemunitdir/iscsid.service \
- $systemdsystemunitdir/iscsid.socket \
- $systemdsystemunitdir/iscsiuio.service \
- $systemdsystemunitdir/iscsiuio.socket \
- iscsiadm iscsid
-
- mkdir -p
"${initdir}/$systemdsystemunitdir/sockets.target.wants"
- for i in \
- iscsiuio.socket \
- ; do
- ln_r "$systemdsystemunitdir/${i}"
"$systemdsystemunitdir/sockets.target.wants/${i}"
- done
-
- mkdir -p
"${initdir}/$systemdsystemunitdir/basic.target.wants"
- for i in \
- iscsid.service \
- ; do
- ln_r "$systemdsystemunitdir/${i}"
"$systemdsystemunitdir/basic.target.wants/${i}"
- done
-
- # Make sure iscsid is started after dracut-cmdline
and ready for the initqueue
- mkdir -p
"${initdir}/$systemdsystemunitdir/iscsid.service.d"
- (
- echo "[Unit]"
- echo "After=dracut-cmdline.service"
- echo "Before=dracut-initqueue.service"
- ) >
"${initdir}/$systemdsystemunitdir/iscsid.service.d/dracut.conf"
I'm not sure why you've removed all of this.
Because the above is added for the iscsid service, which is no longer
needed.
fi
inst_dir /var/lib/iscsi
dracut_need_initqueue
diff --git a/modules.d/95iscsi/parse-iscsiroot.sh
b/modules.d/95iscsi/parse-iscsiroot.sh
index 43b2e088..c8c66ccf 100755
--- a/modules.d/95iscsi/parse-iscsiroot.sh
+++ b/modules.d/95iscsi/parse-iscsiroot.sh
@@ -116,11 +116,6 @@ if arg=$(getarg rd.iscsi.initiator -d
iscsi_initiator=) && [ -n "$arg" ] && ! [
if ! [ -e /etc/iscsi/initiatorname.iscsi ]; then
mkdir -p /etc/iscsi
ln -fs /run/initiatorname.iscsi
/etc/iscsi/initiatorname.iscsi
- if [ -n "$DRACUT_SYSTEMD" ]; then
- systemctl try-restart iscsid
- # FIXME: iscsid is not yet ready, when the
service is :-/
- sleep 1
- fi
fi
fi
@@ -133,11 +128,6 @@ if [ -z $iscsi_initiator ] && [ -f
/sys/firmware/ibft/initiator/initiator-name ]
mkdir -p /etc/iscsi
ln -fs /run/initiatorname.iscsi
/etc/iscsi/initiatorname.iscsi
> /tmp/iscsi_set_initiator
- if [ -n "$DRACUT_SYSTEMD" ]; then
- systemctl try-restart iscsid
- # FIXME: iscsid is not yet ready, when the
service is :-/
- sleep 1
- fi
fi
fi
-- 2.11.1
So in general you've stopped starting or restarting the iscsid
service?
Yes. Actually we do not even start the iscsid service from the
beginning. The whole fix is to remove iscsid but only run iscsistart.