On 4/29/20 2:11 AM, Simone Lazzaris wrote: > In data martedì 28 aprile 2020 18:41:27 CEST, Mike Christie ha scritto: > > > >> Could you send me: > >> > >> 1. The /var/log/messages for the initiator when you do IO and see those > >> lock messages. > > > > On the initiator (XenServer 7.1 which is based on CentOS AFAIK) the > /var/log/messages is empty. > > I (sporadicly) see: > > Apr 29 09:00:36 xs-n1 systemd[1]: Starting Multipath Count Service... > > Apr 29 09:00:36 xs-n1 systemd[1]: Started Multipath Count Service. > > Apr 29 09:00:36 xs-n1 systemd[1]: Started Session 146 of user root. > > Apr 29 09:00:36 xs-n1 systemd[1]: Starting Session 146 of user root. > > Apr 29 09:00:40 xs-n1 multipathd: dm-3: remove map (uevent) > > Apr 29 09:00:40 xs-n1 multipathd: dm-3: devmap not registered, can't remove > > Apr 29 09:00:40 xs-n1 multipathd: dm-3: remove map (uevent) > > Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen] > mpathalert=>xapi [label="PBD.get_all_records"]; > > Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen] > mpathalert=>xapi [label="host.get_uuid"]; > > Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen] > mpathalert=>xapi [label="host.get_name_label"]; > > Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen] > mpathalert=>xapi [label="host.get_uuid"]; > > Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen] > mpathalert=>xapi [label="host.get_name_label"]; > > Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen] > mpathalert=>xapi [label="host.get_uuid"]; > > Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen] > mpathalert=>xapi [label="host.get_name_label"]; > > Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen] > mpathalert=>xapi [label="host.get_uuid"]; > > Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen] > mpathalert=>xapi [label="host.get_name_label"]; > > Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen] > mpathalert=>xapi [label="host.get_uuid"]; > > Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen] > mpathalert=>xapi [label="host.get_name_label"]; > > Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen] > mpathalert=>xapi [label="host.get_uuid"]; > > Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen] > mpathalert=>xapi [label="host.get_name_label"]; > > Apr 29 09:00:40 xs-n1 mpathalert: [debug|xs-n1|2 ||mscgen] > mpathalert=>xapi [label="host.get_all_records"]; > > > > > >> 2. The output of > >> > >> From one of the gateways: > >> # gwcli ls > >> > > Attached (gwcli.txt) > >> From the initiator node you send the /var/log/messages for: > >> # iscsiadm -m session -P 3 > > > > attacched (iscsi-session.txt) > > > >> # multipath -ll > >> > > > > 36001405d7480e5f84b94ab19ebeebd6c dm-0 LIO-ORG ,TCMU device > > size=3.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw > > |-+- policy='queue-length 0' prio=50 status=active > > | `- 2:0:0:0 sdc 8:32 active ready running > > `-+- policy='queue-length 0' prio=10 status=enabled > > `- 3:0:0:0 sdb 8:16 active ready running > > > >> 3. version info: > >> > >> # uname -a > > > > On the Initiator: > > Linux xs-n1 4.4.0+2 #1 SMP Thu Jun 15 16:38:02 UTC 2017 x86_64 x86_64 > x86_64 GNU/Linux > > > > On the Target: > > Linux iscsi1 4.18.0-147.8.1.el8_1.x86_64 #1 SMP Thu Apr 9 13:49:54 UTC > 2020 x86_64 x86_64 x86_64 GNU/Linux > > > >> > >> If you using rpm do: > >> # rpm -q ceph-iscsi > >> # rpm -q tcmu-runner > >> # rpm -q python-rtslib > >> > > No, I've installed them from source on the target What version of tcmu-runner did you use? Was it one of the 1.4 or 1.5 releases or from the github master branch? There was a bug in the older 1.4 release where due to a linux kernel initiator side change the behavior for an error code we used went from retrying for up to 5 minutes to 5 times. The 5 retries were then used in less than a second, so we could see the issue you are seeing. > >> To map that to an iscsi gateway then you can do the following. > >> > >> If sdb is the AO one, then run > >> > >> iscsiadm -m session -P 3 > >> > >> Here you can see the sdXYZ name to iscsi session mapping. The iscsi > >> session/connection's target IP address from that command should match to > >> the gateway that is listed as the "owner" of the LUN in the "gwcli ls" > >> output. > > > > I see... thanks for the hint. > > > > I've done a test: I've unmapped all the drive, then mapped the first > gateway (iscsi1) on all the nodes, waited, then mapped the second > gateway, to be sure that all the nodes would see the first node as the > active/master > > Now things seems a little better in "normal" vm use: I only see the > "Cannot send after transport endpoint shutdown." on the secondary target > node. > > > > I do see some hopping between the nodes when importing a disk drive, but > at this point I'm starting to suspect some strange activity from the Xen > infrastructure in that circumstance. > > > > -- > > *Simone Lazzaris* > > *Qcom S.p.A. a socio unico* > > simone.lazzaris@xxxxxxx <mailto:simone.lazzaris@xxxxxxx> | www.qcom.it > <https://www.qcom.it> > > * LinkedIn <https://www.linkedin.com/company/qcom-spa>* | *Facebook* > <http://www.facebook.com/qcomspa> > > > > > > > > > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx