On 4/27/20 10:43 AM, Simone Lazzaris wrote: > Hi; > > I've build two iscsi gateway for our (small) ceph cluster.The cluster is a nautilus installation, 4 > nodes with 9x4TB each, and it's working fine. We mainly use it via s3 object storage interface, > but I've deployed also some rbd block devices and a cephfs filesystem. > > Now I'm trying to connect it to my xenserver installation. Xenserver doesn't speak rados, so > I've build the iscsi gateways. Right now they are self-hosted on the xenserver, with plan to > move them into physical boxes if/when needed. > > The gateways are build on centos8, tcmu-runner just cloned from git (I think it's 1.5.2). I've > been able to connect them to our six nodes xenserver cluster, and now I'm trying to use it. > Are you using the ceph-iscsi tools with tcmu-runner or did you setup tcmu-runner directly with targetcli? > When I attempt a migration of a VM disk, on the new iscsi volume, I've got these messages > on the logfile that I find very worrying: > > > Apr 27 17:32:21 iscsi2 tcmu-runner[2344]: alua_implicit_transition:574 rbd/ > rbdindex0.scsidisk0: Starting lock acquisition operation. > Apr 27 17:32:22 iscsi2 tcmu-runner[2344]: tcmu_rbd_lock:762 rbd/rbdindex0.scsidisk0: > Acquired exclusive lock. > Apr 27 17:32:22 iscsi2 tcmu-runner[2344]: tcmu_acquire_dev_lock:441 rbd/ > rbdindex0.scsidisk0: Lock acquisition successful > Apr 27 17:32:23 iscsi2 tcmu-runner[2344]: tcmu_notify_lock_lost:222 rbd/rbdindex0.scsidisk0: > Async lock drop. Old state 1 You would see these: 1. when paths are discovered initially. The initiator is sending IO to all paths at the same time, so the lock is bouncing between all the paths. You should only see this for 10-60 seconds depending on how many paths you have, number of nodes, etc. When the multipath layer kicks in and adds the paths to the dm-multipath device then they should stop. 2. during failover/failback when the multipath layer switches paths and one path takes the lock from the previously used one. Or, if you exported a disk to multiple initiator nodes, and some initiator nodes can't reach the active optimized path, so some initiators are using the optimized path and some are using the non-optimized path. 3. If you have misconfigured the system. If you used active/active or had initiator nodes discover different paths for the same disk or not log into all the paths. > Apr 27 17:32:23 iscsi2 tcmu-runner[2344]: alua_implicit_transition:574 rbd/ > rbdindex0.scsidisk0: Starting lock acquisition operation. > Apr 27 17:32:23 iscsi2 tcmu-runner[2344]: tcmu_rbd_lock:762 rbd/rbdindex0.scsidisk0: > Acquired exclusive lock. > Apr 27 17:32:23 iscsi2 tcmu-runner[2344]: tcmu_acquire_dev_lock:441 rbd/ > rbdindex0.scsidisk0: Lock acquisition successful > Apr 27 17:32:25 iscsi2 tcmu-runner[2344]: tcmu_notify_lock_lost:222 rbd/rbdindex0.scsidisk0: > Async lock drop. Old state 1 > Apr 27 17:32:25 iscsi2 tcmu-runner[2344]: alua_implicit_transition:574 rbd/ > rbdindex0.scsidisk0: Starting lock acquisition operation. > Apr 27 17:32:26 iscsi2 tcmu-runner[2344]: tcmu_rbd_lock:762 rbd/rbdindex0.scsidisk0: > Acquired exclusive lock. > Apr 27 17:32:26 iscsi2 tcmu-runner[2344]: tcmu_acquire_dev_lock:441 rbd/ > rbdindex0.scsidisk0: Lock acquisition successful > Apr 27 17:32:27 iscsi2 tcmu-runner[2344]: tcmu_notify_lock_lost:222 rbd/rbdindex0.scsidisk0: > Async lock drop. Old state 1 > Apr 27 17:32:27 iscsi2 tcmu-runner[2344]: alua_implicit_transition:574 rbd/ > rbdindex0.scsidisk0: Starting lock acquisition operation. > Apr 27 17:32:28 iscsi2 tcmu-runner[2344]: tcmu_rbd_lock:762 rbd/rbdindex0.scsidisk0: > Acquired exclusive lock. > Apr 27 17:32:28 iscsi2 tcmu-runner[2344]: tcmu_acquire_dev_lock:441 rbd/ > rbdindex0.scsidisk0: Lock acquisition successful > Apr 27 17:32:29 iscsi2 tcmu-runner[2344]: tcmu_notify_lock_lost:222 rbd/rbdindex0.scsidisk0: > Async lock drop. Old state 1 > Apr 27 17:32:29 iscsi2 tcmu-runner[2344]: alua_implicit_transition:574 rbd/ > rbdindex0.scsidisk0: Starting lock acquisition operation. > Apr 27 17:32:30 iscsi2 tcmu-runner[2344]: tcmu_rbd_lock:762 rbd/rbdindex0.scsidisk0: > Acquired exclusive lock. > Apr 27 17:32:30 iscsi2 tcmu-runner[2344]: tcmu_acquire_dev_lock:441 rbd/ > rbdindex0.scsidisk0: Lock acquisition successful > Apr 27 17:32:31 iscsi2 tcmu-runner[2344]: tcmu_notify_lock_lost:222 rbd/rbdindex0.scsidisk0: > Async lock drop. Old state 1 > Apr 27 17:32:31 iscsi2 tcmu-runner[2344]: alua_implicit_transition:574 rbd/ > rbdindex0.scsidisk0: Starting lock acquisition operation. > Apr 27 17:32:32 iscsi2 tcmu-runner[2344]: tcmu_rbd_lock:762 rbd/rbdindex0.scsidisk0: > Acquired exclusive lock. > Apr 27 17:32:32 iscsi2 tcmu-runner[2344]: tcmu_acquire_dev_lock:441 rbd/ > rbdindex0.scsidisk0: Lock acquisition successful > Apr 27 17:32:33 iscsi2 tcmu-runner[2344]: tcmu_notify_lock_lost:222 rbd/rbdindex0.scsidisk0: > Async lock drop. Old state 1 > Apr 27 17:32:33 iscsi2 tcmu-runner[2344]: alua_implicit_transition:574 rbd/ > rbdindex0.scsidisk0: Starting lock acquisition operation. > Apr 27 17:32:34 iscsi2 tcmu-runner[2344]: tcmu_rbd_lock:762 rbd/rbdindex0.scsidisk0: > Acquired exclusive lock. > Apr 27 17:32:34 iscsi2 tcmu-runner[2344]: tcmu_acquire_dev_lock:441 rbd/ > rbdindex0.scsidisk0: Lock acquisition successful > Apr 27 17:32:36 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: > Could not check lock ownership. Error: Cannot send after transport endpoint shutdown. > > > After a while the migration fails, and I keep seend the error on the logs: > > Apr 27 17:36:01 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: > Could not check lock ownership. Error: Cannot send after transport endpoint shutdown. What are you using for path_checker in /etc/multipath.conf on the initiator side? This is a bug but can be ignored. I am working on a fix. Basically, we the multipath layer is checking our state. We report we do not have the lock correctly to the initiator, but we also get this log message over and over when the multipath layer sends its path checker command. > Apr 27 17:36:06 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: > Could not check lock ownership. Error: Cannot send after transport endpoint shutdown. > Apr 27 17:36:08 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: > Could not check lock ownership. Error: Cannot send after transport endpoint shutdown. > Apr 27 17:36:09 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: > Could not check lock ownership. Error: Cannot send after transport endpoint shutdown. > Apr 27 17:36:16 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: > Could not check lock ownership. Error: Cannot send after transport endpoint shutdown. > Apr 27 17:36:21 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: > Could not check lock ownership. Error: Cannot send after transport endpoint shutdown. > Apr 27 17:36:21 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: > Could not check lock ownership. Error: Cannot send after transport endpoint shutdown. > Apr 27 17:36:26 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: > Could not check lock ownership. Error: Cannot send after transport endpoint shutdown. > Apr 27 17:36:28 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: > Could not check lock ownership. Error: Cannot send after transport endpoint shutdown. > Apr 27 17:36:29 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: > Could not check lock ownership. Error: Cannot send after transport endpoint shutdown. > Apr 27 17:36:36 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516 rbd/rbdindex0.scsidisk0: > Could not check lock ownership. Error: Cannot send after transport endpoint shutdown. > > Any hints? Is this a bug? > -- > *Simone Lazzaris* > *Qcom S.p.A. a socio unico* > simone.lazzaris@xxxxxxx[1] | www.qcom.it[2] > * LinkedIn[3]* | *Facebook[4]* > [5] > > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx