TCMU Runner: Could not check lock ownership. Error: Cannot send after transport endpoint shutdown

Kilian Ries <mail@xxxxxxxxxxxxxx> · Tue, 22 Oct 2019 08:20:01 +0000

Hi,

i'm running a ceph cluster with 4x ISCSI exporter nodes and oVirt on the client side. In the tcmu-runner logs i the the following happening every few seconds:

###
2019-10-22 10:11:11.231 1710 [WARN] tcmu_rbd_lock:762 rbd/image.lun0: Acquired exclusive lock.
2019-10-22 10:11:11.395 1710 [ERROR] tcmu_rbd_has_lock:516 rbd/image.lun2: Could not check lock ownership. Error: Cannot send after transport endpoint shutdown.
2019-10-22 10:11:12.346 1710 [WARN] tcmu_notify_lock_lost:222 rbd/image.lun0: Async lock drop. Old state 1
2019-10-22 10:11:12.353 1710 [INFO] alua_implicit_transition:566 rbd/image.lun0: Starting lock acquisition operation.
2019-10-22 10:11:13.325 1710 [INFO] alua_implicit_transition:566 rbd/image.lun0: Starting lock acquisition operation.
2019-10-22 10:11:13.852 1710 [ERROR] tcmu_rbd_has_lock:516 rbd/image.lun2: Could not check lock ownership. Error: Cannot send after transport endpoint shutdown.
2019-10-22 10:11:13.854 1710 [ERROR] tcmu_rbd_has_lock:516 rbd/image.lun1: Could not check lock ownership. Error: Cannot send after transport endpoint shutdown.
2019-10-22 10:11:13.863 1710 [ERROR] tcmu_rbd_has_lock:516 rbd/image.lun1: Could not check lock ownership. Error: Cannot send after transport endpoint shutdown.
2019-10-22 10:11:14.202 1710 [INFO] alua_implicit_transition:566 rbd/image.lun0: Starting lock acquisition operation.
2019-10-22 10:11:14.285 1710 [ERROR] tcmu_rbd_has_lock:516 rbd/image.lun2: Could not check lock ownership. Error: Cannot send after transport endpoint shutdown.
2019-10-22 10:11:15.217 1710 [WARN] tcmu_rbd_lock:762 rbd/image.lun0: Acquired exclusive lock.
2019-10-22 10:11:15.873 1710 [ERROR] tcmu_rbd_has_lock:516 rbd/image.lun2: Could not check lock ownership. Error: Cannot send after transport endpoint shutdown.
2019-10-22 10:11:16.696 1710 [WARN] tcmu_notify_lock_lost:222 rbd/image.lun0: Async lock drop. Old state 1
2019-10-22 10:11:16.696 1710 [INFO] alua_implicit_transition:566 rbd/image.lun0: Starting lock acquisition operation.
2019-10-22 10:11:16.696 1710 [WARN] tcmu_notify_lock_lost:222 rbd/image.lun0: Async lock drop. Old state 2
2019-10-22 10:11:16.992 1710 [ERROR] tcmu_rbd_has_lock:516 rbd/image.lun2: Could not check lock ownership. Error: Cannot send after transport endpoint shutdown.
###

This happens on all of my four iscsi exporter nodes. Blacklist gives me the following (number of blacklisted objects does not really shrink):

###
ceph osd blacklist ls

listed 10579 entries
###

On the client site i configured the multipath config like this:

###
    device {
        vendor                 "LIO-ORG"
        hardware_handler       "1 alua"
        path_grouping_policy   "failover"
        path_selector          "queue-length 0"
        failback               60
        path_checker           tur
        prio                   alua
        prio_args              exclusive_pref_bit
        fast_io_fail_tmo       25
        no_path_retry          queue
    }
###

And multipath -ll shows me all four path as "active ready" without errors.

For me this looks like the tcmu-runner cannot aquire exclusive lock and it is flapping between nodes. In addition, in the ceph gui / dashboard i can see the LUNs in the "active / optimized" state are flapping between nodes ...

I'm have installed the following versions (CentOS 7.7, Ceph 13.2.6):

###
rpm -qa |egrep "ceph|iscsi|tcmu|rst|kernel"

python-cephfs-13.2.6-0.el7.x86_64
ceph-selinux-13.2.6-0.el7.x86_64
kernel-3.10.0-957.5.1.el7.x86_64
kernel-3.10.0-957.1.3.el7.x86_64
kernel-tools-libs-3.10.0-1062.1.2.el7.x86_64
libcephfs2-13.2.6-0.el7.x86_64
libtcmu-1.4.0-106.gd17d24e.el7.x86_64
ceph-common-13.2.6-0.el7.x86_64
ceph-osd-13.2.6-0.el7.x86_64
tcmu-runner-1.4.0-106.gd17d24e.el7.x86_64
kernel-3.10.0-1062.1.2.el7.x86_64
ceph-iscsi-3.3-1.el7.noarch
kernel-headers-3.10.0-1062.1.2.el7.x86_64
kernel-3.10.0-862.14.4.el7.x86_64
ceph-base-13.2.6-0.el7.x86_64
kernel-tools-3.10.0-1062.1.2.el7.x86_64
###

Greets,
Kilian

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com