On 08/06/2019 11:28 AM, Mike Christie wrote: > On 08/06/2019 07:51 AM, Matthias Leopold wrote: >> >> >> Am 05.08.19 um 18:31 schrieb Mike Christie: >>> On 08/05/2019 05:58 AM, Matthias Leopold wrote: >>>> Hi, >>>> >>>> I'm still testing my 2 node (dedicated) iSCSI gateway with ceph 12.2.12 >>>> before I dare to put it into production. I installed latest tcmu-runner >>>> release (1.5.1) and (like before) I'm seeing that both nodes switch >>>> exclusive locks for the disk images every 21 seconds. tcmu-runner logs >>>> look like this: >>>> >>>> 2019-08-05 12:53:04.184 13742 [WARN] tcmu_notify_lock_lost:222 >>>> rbd/iscsi.test03: Async lock drop. Old state 1 >>>> 2019-08-05 12:53:04.714 13742 [WARN] tcmu_rbd_lock:762 rbd/iscsi.test03: >>>> Acquired exclusive lock. >>>> 2019-08-05 12:53:25.186 13742 [WARN] tcmu_notify_lock_lost:222 >>>> rbd/iscsi.test03: Async lock drop. Old state 1 >>>> 2019-08-05 12:53:25.773 13742 [WARN] tcmu_rbd_lock:762 rbd/iscsi.test03: >>>> Acquired exclusive lock. >>>> >>>> Old state can sometimes be 0 or 2. >>>> Is this expected behaviour? >>> >>> What initiator OS are you using? >>> >> >> I'm using CentOS 7 initiators and I somehow missed to configure >> multipathd on them correctly (device { vendor "LIO.ORG" ... }). After >> fixing that the above problem disappeared and the output of "multipath >> -ll" finally looks correct. Thanks for pointing me to this. >> >> Nevertheless there's now another problem visible in the logs. As soon as >> an initiator logs in tcmu-runner on the gateway node that doesn't own >> the image being accessed logs >> >> [ERROR] tcmu_rbd_has_lock:516 rbd/iscsi.test02: Could not check lock >> ownership. Error: Cannot send after transport endpoint shutdown. >> >> This disappears after the osd blacklist entries for the node expire >> (visible with "ceph osd blacklist ls"). I haven't yet understood how >> this is supposed to work, right now I restarted from scratch (logged >> out, waited till all blacklist entries disappeared, logged in) and I'm >> again seeing several blacklist entries for both gateway nodes (and the >> above error message in tcmu-runner.log). This doesn't seem to interfere >> with the iSCSI service, but I want this explained/resolved before I can >> start using the gateways. > > This is expected. Before multipath kicks in during path > addition/readdition and during failover/failback you can have IO on > multiple paths, so the lock is going to bounce temporarily and gws are > going to be blacklisted. > > It should not happen non stop like you saw in the original email. Actually, you can see that message: Could not check lock ownership. Error: Cannot send after transport endpoint shutdown. non stop if the initiator/multipath layer are doing certain types of path testing. For example, if they were using inquiry or RPTG, instead of TUR, for path testing IOs, then every N seconds you would see that message logged. I will send a path to quiet it for those types of situations. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com