Re: tcmu-runner crashing on 16.2.5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




However, the gwcli command is still showing the other two gateways which are no longer enabled anymore. Where does this list of gateways get stored?

All this configurations are stored in the "gateway.conf" object in "rbd" pool.


How do I access this object? Is it a file or some kind of object store?



 It appears that the two gateways that are no longer part of the cluster still appear as the owners of some of the LUNs:

/iscsi-targets> ls
o- iscsi-targets ................................................................................. [DiscoveryAuth: CHAP, Targets: 3]
  o- iqn.2001-07.com.ceph:1622752075720 .................................................................. [Auth: CHAP, Gateways: 4]
  | o- disks ............................................................................................................ [Disks: 5]
  | | o- iscsi-pool-0001/iscsi-p0001-img-01 ........................................... [Owner: cxcto-c240-j27-02.cisco.com<http://cxcto-c240-j27-02.cisco.com>, Lun: 0]
  | | o- iscsi-pool-0001/iscsi-p0001-img-02 ........................................... [Owner: cxcto-c240-j27-04.cisco.com<http://cxcto-c240-j27-04.cisco.com>, Lun: 3]
  | | o- iscsi-pool-0003/iscsi-p0003-img-01 ........................................... [Owner: cxcto-c240-j27-03.cisco.com<http://cxcto-c240-j27-03.cisco.com>, Lun: 1]
  | | o- iscsi-pool-0003/iscsi-p0003-img-02 ........................................... [Owner: cxcto-c240-j27-05.cisco.com<http://cxcto-c240-j27-05.cisco.com>, Lun: 4]
  | | o- iscsi-pool-0005/iscsi-p0005-img-01 ........................................... [Owner: cxcto-c240-j27-02.cisco.com<http://cxcto-c240-j27-02.cisco.com>, Lun: 2]
  | o- gateways .............................................................................................. [Up: 2/4, Portals: 4]
  | | o- cxcto-c240-j27-02.cisco.com<http://cxcto-c240-j27-02.cisco.com> ......................................................................... [10.122.242.197 (UP)]
  | | o- cxcto-c240-j27-03.cisco.com<http://cxcto-c240-j27-03.cisco.com> ......................................................................... [10.122.242.198 (UP)]
  | | o- cxcto-c240-j27-04.cisco.com<http://cxcto-c240-j27-04.cisco.com> .................................................................... [10.122.242.199 (UNKNOWN)]
  | | o- cxcto-c240-j27-05.cisco.com<http://cxcto-c240-j27-05.cisco.com> .................................................................... [10.122.242.200 (UNKNOWN)]
  | o- host-groups .................................................................................................... [Groups : 0]
  | o- hosts ........................................................................................ [Auth: ACL_DISABLED, Hosts: 0]
  o- iqn.2001-07.com.ceph:1622752147345 .................................................................. [Auth: CHAP, Gateways: 4]
  | o- disks ............................................................................................................ [Disks: 5]
  | | o- iscsi-pool-0002/iscsi-p0002-img-01 ........................................... [Owner: cxcto-c240-j27-04.cisco.com<http://cxcto-c240-j27-04.cisco.com>, Lun: 0]
  | | o- iscsi-pool-0002/iscsi-p0002-img-02 ........................................... [Owner: cxcto-c240-j27-02.cisco.com<http://cxcto-c240-j27-02.cisco.com>, Lun: 3]
  | | o- iscsi-pool-0004/iscsi-p0004-img-01 ........................................... [Owner: cxcto-c240-j27-05.cisco.com<http://cxcto-c240-j27-05.cisco.com>, Lun: 1]
  | | o- iscsi-pool-0004/iscsi-p0004-img-02 ........................................... [Owner: cxcto-c240-j27-03.cisco.com<http://cxcto-c240-j27-03.cisco.com>, Lun: 4]
  | | o- iscsi-pool-0006/iscsi-p0006-img-01 ........................................... [Owner: cxcto-c240-j27-03.cisco.com<http://cxcto-c240-j27-03.cisco.com>, Lun: 2]
  | o- gateways .............................................................................................. [Up: 2/4, Portals: 4]
  | | o- cxcto-c240-j27-02.cisco.com<http://cxcto-c240-j27-02.cisco.com> ......................................................................... [10.122.242.197 (UP)]
  | | o- cxcto-c240-j27-03.cisco.com<http://cxcto-c240-j27-03.cisco.com> ......................................................................... [10.122.242.198 (UP)]
  | | o- cxcto-c240-j27-04.cisco.com<http://cxcto-c240-j27-04.cisco.com> .................................................................... [10.122.242.199 (UNKNOWN)]
  | | o- cxcto-c240-j27-05.cisco.com<http://cxcto-c240-j27-05.cisco.com> .................................................................... [10.122.242.200 (UNKNOWN)]
  | o- host-groups .................................................................................................... [Groups : 0]
  | o- hosts ........................................................................................ [Auth: ACL_DISABLED, Hosts: 0]
  o- iqn.2001-07.com.ceph:1627307422533 .................................................................. [Auth: CHAP, Gateways: 4]
    o- disks ............................................................................................................ [Disks: 1]
    | o- iscsi-pool-0007/iscsi-p0007-img-01 ........................................... [Owner: cxcto-c240-j27-04.cisco.com<http://cxcto-c240-j27-04.cisco.com>, Lun: 0]
    o- gateways .............................................................................................. [Up: 2/4, Portals: 4]
    | o- cxcto-c240-j27-02.cisco.com<http://cxcto-c240-j27-02.cisco.com> ......................................................................... [10.122.242.197 (UP)]
    | o- cxcto-c240-j27-03.cisco.com<http://cxcto-c240-j27-03.cisco.com> ......................................................................... [10.122.242.198 (UP)]
    | o- cxcto-c240-j27-04.cisco.com<http://cxcto-c240-j27-04.cisco.com> .................................................................... [10.122.242.199 (UNKNOWN)]
    | o- cxcto-c240-j27-05.cisco.com<http://cxcto-c240-j27-05.cisco.com> .................................................................... [10.122.242.200 (UNKNOWN)]
    o- host-groups .................................................................................................... [Groups : 0]
    o- hosts ........................................................................................ [Auth: ACL_DISABLED, Hosts: 0]


Currently only cxcto-c240-j27-02 and cxcto-c240-j27-03 are enabled, so I would not expect to see cxcto-c240-j27-04 and cxcto-c240-j27-05 as owning some of the LUNs, but as you can see, they are there. Is this a known issue and is there a way to clean this up? Worst-case now that I know how to make sure the ESXi hosts see all the paths, I can just bring back up the other two that I had removed, but was curious is there was a way to clean this up. I’m guessing something is missing in what cephadm does to clean up when it removes a node.

It seems the cephadm or you didn't clean that up. How did that two stale gateways come ? Before upgrading you were using them ? And after upgrading you switched to -02 and -03 ones ?


I’m not sure what you mean by “you didn’t clean that up”. Are there steps I need to take to clean up besides re-applying the configuration using ceph orch?

This cluster was initially installed on 16.2.4. The only upgrade I’ve done was to 16.2.5. All 4 gateways were present before and after the upgrade. I only recently removed the two (well, actually I had removed 3 of them leaving only one) as a workaround for this problem. After I figured out what was causing the issue on the ESXi side, I added one back.

The way that I’m adding / removing is through a yaml file like this:

service_type: iscsi
service_id: iscsi
placement:
  hosts:
    - cxcto-c240-j27-02.cisco.com<http://cxcto-c240-j27-02.cisco.com>
    - cxcto-c240-j27-03.cisco.com<http://cxcto-c240-j27-03.cisco.com>
spec:
  pool: iscsi-config

(I’ve removed the lines with the username / password here)

Originally the file had 4 hosts, then I switched it to 1, and now there are 2. I’m applying the configuration using "ceph orch apply -I iscsi.yaml”

ceph orch ls seems to show the correct configuration of only two gateways configured.

BTW - I’ve always had this problem from day 1 that I filed a bug for - https://tracker.ceph.com/issues/51111#change-199548 - not sure if it’s related, but it looks like tracking the tcmu-runner containers has never quite worked properly.

-Paul




_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux