Re: tcmu-runner: "Acquired exclusive lock" every 21s

Mike Christie <mchristi@xxxxxxxxxx> · Tue, 6 Aug 2019 11:39:22 -0500

On 08/06/2019 11:28 AM, Mike Christie wrote:
> On 08/06/2019 07:51 AM, Matthias Leopold wrote:
>>
>>
>> Am 05.08.19 um 18:31 schrieb Mike Christie:
>>> On 08/05/2019 05:58 AM, Matthias Leopold wrote:
>>>> Hi,
>>>>
>>>> I'm still testing my 2 node (dedicated) iSCSI gateway with ceph 12.2.12
>>>> before I dare to put it into production. I installed latest tcmu-runner
>>>> release (1.5.1) and (like before) I'm seeing that both nodes switch
>>>> exclusive locks for the disk images every 21 seconds. tcmu-runner logs
>>>> look like this:
>>>>
>>>> 2019-08-05 12:53:04.184 13742 [WARN] tcmu_notify_lock_lost:222
>>>> rbd/iscsi.test03: Async lock drop. Old state 1
>>>> 2019-08-05 12:53:04.714 13742 [WARN] tcmu_rbd_lock:762 rbd/iscsi.test03:
>>>> Acquired exclusive lock.
>>>> 2019-08-05 12:53:25.186 13742 [WARN] tcmu_notify_lock_lost:222
>>>> rbd/iscsi.test03: Async lock drop. Old state 1
>>>> 2019-08-05 12:53:25.773 13742 [WARN] tcmu_rbd_lock:762 rbd/iscsi.test03:
>>>> Acquired exclusive lock.
>>>>
>>>> Old state can sometimes be 0 or 2.
>>>> Is this expected behaviour?
>>>
>>> What initiator OS are you using?
>>>
>>
>> I'm using CentOS 7 initiators and I somehow missed to configure
>> multipathd on them correctly (device { vendor "LIO.ORG" ... }). After
>> fixing that the above problem disappeared and the output of "multipath
>> -ll" finally looks correct. Thanks for pointing me to this.
>>
>> Nevertheless there's now another problem visible in the logs. As soon as
>> an initiator logs in tcmu-runner on the gateway node that doesn't own
>> the image being accessed logs
>>
>> [ERROR] tcmu_rbd_has_lock:516 rbd/iscsi.test02: Could not check lock
>> ownership. Error: Cannot send after transport endpoint shutdown.
>>
>> This disappears after the osd blacklist entries for the node expire
>> (visible with "ceph osd blacklist ls"). I haven't yet understood how
>> this is supposed to work, right now I restarted from scratch (logged
>> out, waited till all blacklist entries disappeared, logged in) and I'm
>> again seeing several blacklist entries for both gateway nodes (and the
>> above error message in tcmu-runner.log). This doesn't seem to interfere
>> with the iSCSI service, but I want this explained/resolved before I can
>> start using the gateways.
> 
> This is expected. Before multipath kicks in during path
> addition/readdition and during failover/failback you can have IO on
> multiple paths, so the lock is going to bounce temporarily and gws are
> going to be blacklisted.
> 
> It should not happen non stop like you saw in the original email.

Actually, you can see that message:

Could not check lock ownership. Error: Cannot send after transport
endpoint shutdown.

non stop if the initiator/multipath layer are doing certain types of
path testing. For example, if they were using inquiry or RPTG, instead
of TUR, for path testing IOs, then every N seconds you would see that
message logged.

I will send a path to quiet it for those types of situations.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com