Re: Lock errors in iscsi gateway

Simone Lazzaris <simone.lazzaris@xxxxxxx> · Tue, 28 Apr 2020 09:21:03 +0200

In data lunedì 27 aprile 2020 18:46:09 CEST, Mike Christie ha scritto:

[snip]

> Are you using the ceph-iscsi tools with tcmu-runner or did you setup
> tcmu-runner directly with targetcli?
> 
I followed this guide: https://docs.ceph.com/docs/master//rbd/iscsi-target-cli/[1] and 
configured the target with gwcli, so I think I'm using ceph-iscsi tools.

[snip]
> 
> You would see these:
> 
> 1. when paths are discovered initially. The initiator is sending IO to
> all paths at the same time, so the lock is bouncing between all the paths.

Ok, but the nodes are already configured and all path discovered. So that's not the case.

> You should only see this for 10-60 seconds depending on how many paths
> you have, number of nodes, etc. When the multipath layer kicks in and
> adds the paths to the dm-multipath device then they should stop.

I have NO such logs when the system is running unless I start USING the luns 

> 2. during failover/failback when the multipath layer switches paths and
> one path takes the lock from the previously used one.

No failover/failback is occuring.

> Or, if you exported a disk to multiple initiator nodes, and some
> initiator nodes can't reach the active optimized path, so some
> initiators are using the optimized path and some are using the
> non-optimized path.

I do have exported the disk to multiple initiator nodes. How can I tell if they are using all the 
active path?

> 3. If you have misconfigured the system. If you used active/active or
> had initiator nodes discover different paths for the same disk or not
> log into all the paths.

That may be the case, as I don't have much experience with multipath. Anyway, following the 
ceph guide, I've setup the device in /etc/multipath.conf like this:
        device {
                vendor "LIO-ORG"
                product ".*"
                path_grouping_policy "failover"
                path_selector "queue-length 0"
                path_checker "tur"
                hardware_handler "1 alua"
                prio "alua"
                prio_args "exclusive_pref_bit"
                failback 60
                no_path_retry "queue"
                fast_io_fail_tmo 25
        }
and multipath -ll show this on all the six nodes:

36001405d7480e5f84b94ab19ebeebd6c dm-1 LIO-ORG ,TCMU device     
size=1.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=50 status=active
| `- 16:0:0:0 sdb 8:16 active ready running
`-+- policy='queue-length 0' prio=10 status=enabled
  `- 15:0:0:0 sdc 8:32 active ready running

Al all nodes, one path (sdb 8:16) is always "active" with prio 50 and the other (sdc 8:32) is 
always "enabled" with prio 10. I haven't figured out how can I check which iscsi-gateway is 
mapped to the "active" path...

[snip]
> > Apr 27 17:36:01 iscsi2 tcmu-runner[2344]: tcmu_rbd_has_lock:516
> > rbd/rbdindex0.scsidisk0: Could not check lock ownership. Error: Cannot
> > send after transport endpoint shutdown.
> What are you using for path_checker in /etc/multipath.conf on the
> initiator side?

 path_checker is set to "tur".

> This is a bug but can be ignored. I am working on a fix. Basically, we
> the multipath layer is checking our state. We report we do not have the
> lock correctly to the initiator, but we also get this log message over
> and over when the multipath layer sends its path checker command.
> 
And that's ok...

Thanks for all the help you can provide!

*Simone Lazzaris*
*Qcom S.p.A. a socio unico*
simone.lazzaris@xxxxxxx[2] | www.qcom.it[3]
* LinkedIn[4]* | *Facebook[5]*
[6] 

--------
[1] https://docs.ceph.com/docs/master//rbd/iscsi-target-cli/
[2] mailto:simone.lazzaris@xxxxxxx
[3] https://www.qcom.it
[4] https://www.linkedin.com/company/qcom-spa
[5] http://www.facebook.com/qcomspa
[6] https://www.qcom.it/pdf/other/bannerfirmamail.gif
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx