Re: [EXTERNAL] Re: Can't connect to MDS admin socket after updating to cephadm

Eugen Block <eblock@xxxxxx> · Thu, 10 Nov 2022 14:25:31 +0000

Hi,

  If I look at the container name in docker it has the dots changed to
hyphens, but if I try to connect to the name with hyphens it doesn't
work either:

that is correct, that switch from dots to hyphens was introduced in  
pacific [1]. Can you share the content of the unit.run file for that  
container? Can you enter other containers that were changed? Maybe the  
conversion doesn't work as expected?

[1] https://github.com/ceph/ceph/pull/42242

Zitat von Luis Calero Muñoz <luis.calero@xxxxxxxxxxxxxx>:

Hello Eugen, thanks for your answer. I was able to connect like you
showed me until I updated my cluster to ceph version 16.2.10
(pacific). But now it doesn't work anymore:

root@ceph-mds2:~# cephadm ls |grep ceph-mds | grep name
       "name": "mds.cephfs.ceph-mds2.cjpsjm",

root@ceph-mds2:~# cephadm enter --name mds.cephfs.ceph-mds2.cjpsjm
Inferring fsid d1fd0678-88c0-47fb-90da-e40a7a364442
Error: No such container:
ceph-d1fd0678-88c0-47fb-90da-e40a7a364442-mds.cephfs.ceph-mds2.cjpsjm

  If I look at the container name in docker it has the dots changed to
hyphens, but if I try to connect to the name with hyphens it doesn't
work either:

root@ceph-mds2:~# docker ps
CONTAINER ID   IMAGE               COMMAND                  CREATED
         STATUS             PORTS     NAMES
38635f6de533   quay.io/ceph/ceph   "/usr/bin/ceph-mds -…"   About an
hour ago   Up About an hour
ceph-d1fd0678-88c0-47fb-90da-e40a7a364442-mds-cephfs-ceph-mds2-cjpsjm

root@ceph-mds2:~# cephadm enter --name mds-cephfs-ceph-mds2-cjpsjm
ERROR: must pass --fsid to specify cluster
root@ceph-mds2:~# cephadm enter --name mds-cephfs-ceph-mds2-cjpsjm
--fsid d1fd0678-88c0-47fb-90da-e40a7a364442
Traceback (most recent call last):
 File "/usr/sbin/cephadm", line 6158, in <module>
   r = args.func()
 File "/usr/sbin/cephadm", line 1309, in _infer_fsid
   return func()
 File "/usr/sbin/cephadm", line 3580, in command_enter
   (daemon_type, daemon_id) = args.name.split('.', 1)
ValueError: not enough values to unpack (expected 2, got 1)

  What can I do?

--
  Luis

Luis Calero Muñoz
Head of Infrastructure

luis.calero@xxxxxxxxxxxxxx
T. +34 91 787 0000
C/ Apolonio Morales 13C - 28036 Madrid

letsrebold.com

El jue, 3 nov 2022 a las 11:48, Eugen Block (<eblock@xxxxxx>) escribió:

Hi,

you can use cephadm for that now [1]. To attach to a running daemon
you run (run 'cephadm ls' to see all cephadm daemons):

cephadm enter --name <DAEMON> [--fsid <FSID>]

There you can query the daemon as you used to:

storage01:~ # cephadm ls |grep mds
         "name": "mds.cephfs.storage01.ozpeev",

storage01:~ # cephadm enter --name mds.cephfs.storage01.ozpeev
Inferring fsid 877636d0-d118-11ec-83c7-fa163e990a3e
[ceph: root@storage01 /]# ceph daemon mds.cephfs.storage01.ozpeev ops
{
     "ops": [],
     "num_ops": 0
}

You can still restart the daemons with systemctl:

storage01:~ # systemctl restart
ceph-877636d0-d118-11ec-83c7-fa163e990a3e@mds.cephfs.storage01.ozpeev.service

Regards,
Eugen

[1] https://docs.ceph.com/en/latest/man/8/cephadm/?highlight=cephadm%20enter

Zitat von Luis Calero Muñoz <luis.calero@xxxxxxxxxxxxxx>:

> Hello, I'm running a ceph 15.2.15 Octopus cluster, and in preparation to
> update it I've first transformed it to cephadm following the instructions
> in the website. All went well but now i'm having a problem running "ceph
> daemon mds.* dump_ops_in_flight" because it gives me an error:
>
> root@ceph-mds2:~# ceph -s |grep mds
>     mds: cephfs:2
>  
{0=cephfs.ceph-mds1.edwbhe=up:active,1=cephfs.ceph-mds2.cjpsjm=up:active}  
2
> up:standby
>
> root@ceph-mds2:~# ceph daemon mds.cephfs.ceph-mds2.cjpsjm
>  dump_ops_in_flight
>
> admin_socket: exception getting command descriptions: [Errno 2] No such
> file or directory
>
>   One thing i've noticed is that the name of the MDS daemons has changed,
> before cephadm I could would refer them like mds.ceph-mds2 and now they're
> called like mds.cephfs.ceph-mds2.cjpsjm,  where the last part is a random
> string that changes when the daemon is restarted. Running an strace on the
> ceph daemon command I've find out that the problem is that the command is
> looking for a socket in a location that doesn't exist:
>
> root@ceph-mds2:~# strace ceph daemon mds.cephfs.ceph-mds2.cjpsjm
>  dump_ops_in_flight
> [...]
> connect(3, {sa_family=AF_UNIX,
> sun_path="/var/run/ceph/ceph-mds.cephfs.ceph-mds2.cjpsjm.asok"}, 53) = -1
> ENOENT (No such file or directory)
> write(2, "admin_socket: exception getting "..., 90admin_socket: exception
> getting command descriptions: [Errno 2] No such file or directory
>
>
>   Because the socket is actually in a folder inside /var/run/ceph:
>
> root@ceph-mds2:~# ls /var/run/ceph/
>
> d1fd0678-88c0-47fb-90da-e40a7a364442/
>
>
> root@ceph-mds2:~# ls /var/run/ceph/d1fd0678-88c0-47fb-90da-e40a7a364442/
>
> ceph-mds.cephfs.ceph-mds2.cjpsjm.asok
>
>    So if I link the socket to
> /var/run/ceph/ceph-mds.cephfs.ceph-mds2.cjpsjm.asok then the command runs
> without problems. That would be a fix but I would need to make the link
> every time the daemon restarts, so I think that something is not  
right here
> and should work out of the box. What can I do?
>
>    Besides that, I've noticed that after updating to cephadm and docker I
> can't restart the MDS servers with "service ceph-mds@ceph-mds1 restart"
> anymore, what's the proper method to restart them now?
>
>   Regards.
>
>
> --
>   Luis
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx