Re: tcmu-runner: "Acquired exclusive lock" every 21s

Mike Christie <mchristi@xxxxxxxxxx> · Tue, 6 Aug 2019 11:28:48 -0500

On 08/06/2019 07:51 AM, Matthias Leopold wrote:
> 
> 
> Am 05.08.19 um 18:31 schrieb Mike Christie:
>> On 08/05/2019 05:58 AM, Matthias Leopold wrote:
>>> Hi,
>>>
>>> I'm still testing my 2 node (dedicated) iSCSI gateway with ceph 12.2.12
>>> before I dare to put it into production. I installed latest tcmu-runner
>>> release (1.5.1) and (like before) I'm seeing that both nodes switch
>>> exclusive locks for the disk images every 21 seconds. tcmu-runner logs
>>> look like this:
>>>
>>> 2019-08-05 12:53:04.184 13742 [WARN] tcmu_notify_lock_lost:222
>>> rbd/iscsi.test03: Async lock drop. Old state 1
>>> 2019-08-05 12:53:04.714 13742 [WARN] tcmu_rbd_lock:762 rbd/iscsi.test03:
>>> Acquired exclusive lock.
>>> 2019-08-05 12:53:25.186 13742 [WARN] tcmu_notify_lock_lost:222
>>> rbd/iscsi.test03: Async lock drop. Old state 1
>>> 2019-08-05 12:53:25.773 13742 [WARN] tcmu_rbd_lock:762 rbd/iscsi.test03:
>>> Acquired exclusive lock.
>>>
>>> Old state can sometimes be 0 or 2.
>>> Is this expected behaviour?
>>
>> What initiator OS are you using?
>>
> 
> I'm using CentOS 7 initiators and I somehow missed to configure
> multipathd on them correctly (device { vendor "LIO.ORG" ... }). After
> fixing that the above problem disappeared and the output of "multipath
> -ll" finally looks correct. Thanks for pointing me to this.
> 
> Nevertheless there's now another problem visible in the logs. As soon as
> an initiator logs in tcmu-runner on the gateway node that doesn't own
> the image being accessed logs
> 
> [ERROR] tcmu_rbd_has_lock:516 rbd/iscsi.test02: Could not check lock
> ownership. Error: Cannot send after transport endpoint shutdown.
> 
> This disappears after the osd blacklist entries for the node expire
> (visible with "ceph osd blacklist ls"). I haven't yet understood how
> this is supposed to work, right now I restarted from scratch (logged
> out, waited till all blacklist entries disappeared, logged in) and I'm
> again seeing several blacklist entries for both gateway nodes (and the
> above error message in tcmu-runner.log). This doesn't seem to interfere
> with the iSCSI service, but I want this explained/resolved before I can
> start using the gateways.

This is expected. Before multipath kicks in during path
addition/readdition and during failover/failback you can have IO on
multiple paths, so the lock is going to bounce temporarily and gws are
going to be blacklisted.

It should not happen non stop like you saw in the original email.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com