Re: Lock errors in iscsi gateway

Mike Christie <mchristi@xxxxxxxxxx> · Mon, 27 Apr 2020 11:46:09 -0500

On 4/27/20 10:43 AM, Simone Lazzaris wrote:
> Hi;
> 
> I've build two iscsi gateway for our (small) ceph cluster.The cluster is a nautilus installation, 4 
> nodes with 9x4TB each, and it's working fine. We mainly use it via s3 object storage interface, 
> but I've deployed also some rbd block devices and a cephfs filesystem.
> 
> Now I'm trying to connect it to my xenserver installation. Xenserver doesn't speak rados, so 
> I've build the iscsi gateways. Right now they are self-hosted on the xenserver, with plan to 
> move them into physical boxes if/when needed.
> 
> The gateways are build on centos8, tcmu-runner just cloned from git (I think it's 1.5.2). I've 
> been able to connect them to our six nodes xenserver cluster, and now I'm trying to use it.
> 

Are you using the ceph-iscsi tools with tcmu-runner or did you setup
tcmu-runner directly with targetcli?

> When I attempt a migration of a VM disk, on the new iscsi volume,  I've got  these messages 
> on the logfile that I find very worrying:
> 
> 
> Apr 27 17:32:21 iscsi2 tcmu-runner[2344]: alua_implicit_transition:574 rbd/
> rbdindex0.scsidisk0: Starting lock acquisition operation.
> Apr 27 17:32:22 iscsi2 tcmu-runner[2344]: tcmu_rbd_lock:762 rbd/rbdindex0.scsidisk0: 
> Acquired exclusive lock.
> Apr 27 17:32:22 iscsi2 tcmu-runner[2344]: tcmu_acquire_dev_lock:441 rbd/
> rbdindex0.scsidisk0: Lock acquisition successful
> Apr 27 17:32:23 iscsi2 tcmu-runner[2344]: tcmu_notify_lock_lost:222 rbd/rbdindex0.scsidisk0: 
> Async lock drop. Old state 1
You would see these:

1. when paths are discovered initially. The initiator is sending IO to
all paths at the same time, so the lock is bouncing between all the paths.

You should only see this for 10-60 seconds depending on how many paths
you have, number of nodes, etc. When the multipath layer kicks in and
adds the paths to the dm-multipath device then they should stop.

2. during failover/failback when the multipath layer switches paths and
one path takes the lock from the previously used one.

Or, if you exported a disk to multiple initiator nodes, and some
initiator nodes can't reach the active optimized path, so some
initiators are using the optimized path and some are using the
non-optimized path.

3. If you have misconfigured the system. If you used active/active or
had initiator nodes discover different paths for the same disk or not
log into all the paths.

> Apr 27 17:32:23 iscsi2 tcmu-runner[2344]: alua_implicit_transition:574 rbd/
> rbdindex0.scsidisk0: Starting lock acquisition operation.
> Apr 27 17:32:23 iscsi2 tcmu-runner[2344]: tcmu_rbd_lock:762 rbd/rbdindex0.scsidisk0: 
> Acquired exclusive lock.
> Apr 27 17:32:23 iscsi2 tcmu-runner[2344]: tcmu_acquire_dev_lock:441 rbd/
> rbdindex0.scsidisk0: Lock acquisition successful
> Apr 27 17:32:25 iscsi2 tcmu-runner[2344]: tcmu_notify_lock_lost:222 rbd/rbdindex0.scsidisk0: 
> Async lock drop. Old state 1
> Apr 27 17:32:25 iscsi2 tcmu-runner[2344]: alua_implicit_transition:574 rbd/
> rbdindex0.scsidisk0: Starting lock acquisition operation.
> Apr 27 17:32:26 iscsi2 tcmu-runner[2344]: tcmu_rbd_lock:762 rbd/rbdindex0.scsidisk0: 
> Acquired exclusive lock.
> Apr 27 17:32:26 iscsi2 tcmu-runner[2344]: tcmu_acquire_dev_lock:441 rbd/
> rbdindex0.scsidisk0: Lock acquisition successful
> Apr 27 17:32:27 iscsi2 tcmu-runner[2344]: tcmu_notify_lock_lost:222 rbd/rbdindex0.scsidisk0: 
> Async lock drop. Old state 1
> Apr 27 17:32:27 iscsi2 tcmu-runner[2344]: alua_implicit_transition:574 rbd/
> rbdindex0.scsidisk0: Starting lock acquisition operation.
> Apr 27 17:32:28 iscsi2 tcmu-runner[2344]: tcmu_rbd_lock:762 rbd/rbdindex0.scsidisk0: 
> Acquired exclusive lock.
> Apr 27 17:32:28 iscsi2 tcmu-runner[2344]: tcmu_acquire_dev_lock:441 rbd/
> rbdindex0.scsidisk0: Lock acquisition successful
> Apr 27 17:32:29 iscsi2 tcmu-runner[2344]: tcmu_notify_lock_lost:222 rbd/rbdindex0.scsidisk0: 
> Async lock drop. Old state 1
> Apr 27 17:32:29 iscsi2 tcmu-runner[2344]: alua_implicit_transition:574 rbd/
> rbdindex0.scsidisk0: Starting lock acquisition operation.
> Apr 27 17:32:30 iscsi2 tcmu-runner[2344]: tcmu_rbd_lock:762 rbd/rbdindex0.scsidisk0: 
> Acquired exclusive lock.
> Apr 27 17:32:30 iscsi2 tcmu-runner[2344]: tcmu_acquire_dev_lock:441 rbd/
> rbdindex0.scsidisk0: Lock acquisition successful
> Apr 27 17:32:31 iscsi2 tcmu-runner[2344]: tcmu_notify_lock_lost:222 rbd/rbdindex0.scsidisk0: 
> Async lock drop. Old state 1
> Apr 27 17:32:31 iscsi2 tcmu-runner[2344]: alua_implicit_transition:574 rbd/
> rbdindex0.scsidisk0: Starting lock acquisition operation.
> Apr 27 17:32:32 iscsi2 tcmu-runner[2344]: tcmu_rbd_lock:762 rbd/rbdindex0.scsidisk0: 
> Acquired exclusive lock.
> Apr 27 17:32:32 iscsi2 tcmu-runner[2344]: tcmu_acquire_dev_lock:441 rbd/
> rbdindex0.scsidisk0: Lock acquisition successful
> Apr 27 17:32:33 iscsi2 tcmu-runner[2344]: tcmu_notify_lock_lost:222 rbd/rbdindex0.scsidisk0: 
> Async lock drop. Old state 1
> Apr 27 17:32:33 iscsi2 tcmu-runner[2344]: alua_implicit_transition:574 rbd/
> rbdindex0.scsidisk0: Starting lock acquisition operation.
> Apr 27 17:32:34 iscsi2 tcmu-runner[2344]: tcmu_rbd_lock:762 rbd/rbdindex0.scsidisk0: 
> Acquired exclusive lock.
> Apr 27 17:32:34 iscsi2 tcmu-runner[2344]: tcmu_acquire_dev_lock:441 rbd/
> rbdindex0.scsidisk0: Lock acquisition successful
> Apr 27 17:32:36 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: 
> Could not check lock ownership. Error: Cannot send after transport endpoint shutdown.
> 
> 
> After a while the migration fails, and I keep seend the error on the logs:
> 
> Apr 27 17:36:01 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: 
> Could not check lock ownership. Error: Cannot send after transport endpoint shutdown.

What are you using for path_checker in /etc/multipath.conf on the
initiator side?

This is a bug but can be ignored. I am working on a fix. Basically, we
the multipath layer is checking our state. We report we do not have the
lock correctly to the initiator, but we also get this log message over
and over when the multipath layer sends its path checker command.

> Apr 27 17:36:06 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: 
> Could not check lock ownership. Error: Cannot send after transport endpoint shutdown.
> Apr 27 17:36:08 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: 
> Could not check lock ownership. Error: Cannot send after transport endpoint shutdown.
> Apr 27 17:36:09 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: 
> Could not check lock ownership. Error: Cannot send after transport endpoint shutdown.
> Apr 27 17:36:16 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: 
> Could not check lock ownership. Error: Cannot send after transport endpoint shutdown.
> Apr 27 17:36:21 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: 
> Could not check lock ownership. Error: Cannot send after transport endpoint shutdown.
> Apr 27 17:36:21 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: 
> Could not check lock ownership. Error: Cannot send after transport endpoint shutdown.
> Apr 27 17:36:26 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: 
> Could not check lock ownership. Error: Cannot send after transport endpoint shutdown.
> Apr 27 17:36:28 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: 
> Could not check lock ownership. Error: Cannot send after transport endpoint shutdown.
> Apr 27 17:36:29 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: 
> Could not check lock ownership. Error: Cannot send after transport endpoint shutdown.
> Apr 27 17:36:36 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: 
> Could not check lock ownership. Error: Cannot send after transport endpoint shutdown.
> 
> Any hints? Is this a bug?
>  -- 
> *Simone Lazzaris*
> *Qcom S.p.A. a socio unico*
> simone.lazzaris@xxxxxxx[1] | www.qcom.it[2]
> * LinkedIn[3]* | *Facebook[4]*
> [5] 
> 
> 
> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx