Re: [EXTERNAL] Re: Can't connect to MDS admin socket after updating to cephadm

Luis Calero Muñoz <luis.calero@xxxxxxxxxxxxxx> · Thu, 10 Nov 2022 13:53:28 +0100

  Hello Eugen, thanks for your answer. I was able to connect like you
showed me until I updated my cluster to ceph version 16.2.10
(pacific). But now it doesn't work anymore:

root@ceph-mds2:~# cephadm ls |grep ceph-mds | grep name
       "name": "mds.cephfs.ceph-mds2.cjpsjm",

root@ceph-mds2:~# cephadm enter --name mds.cephfs.ceph-mds2.cjpsjm
Inferring fsid d1fd0678-88c0-47fb-90da-e40a7a364442
Error: No such container:
ceph-d1fd0678-88c0-47fb-90da-e40a7a364442-mds.cephfs.ceph-mds2.cjpsjm

  If I look at the container name in docker it has the dots changed to
hyphens, but if I try to connect to the name with hyphens it doesn't
work either:

root@ceph-mds2:~# docker ps
CONTAINER ID   IMAGE               COMMAND                  CREATED
         STATUS             PORTS     NAMES
38635f6de533   quay.io/ceph/ceph   "/usr/bin/ceph-mds -…"   About an
hour ago   Up About an hour
ceph-d1fd0678-88c0-47fb-90da-e40a7a364442-mds-cephfs-ceph-mds2-cjpsjm

root@ceph-mds2:~# cephadm enter --name mds-cephfs-ceph-mds2-cjpsjm
ERROR: must pass --fsid to specify cluster
root@ceph-mds2:~# cephadm enter --name mds-cephfs-ceph-mds2-cjpsjm
--fsid d1fd0678-88c0-47fb-90da-e40a7a364442
Traceback (most recent call last):
 File "/usr/sbin/cephadm", line 6158, in <module>
   r = args.func()
 File "/usr/sbin/cephadm", line 1309, in _infer_fsid
   return func()
 File "/usr/sbin/cephadm", line 3580, in command_enter
   (daemon_type, daemon_id) = args.name.split('.', 1)
ValueError: not enough values to unpack (expected 2, got 1)

  What can I do?

--
  Luis

Luis Calero Muñoz
Head of Infrastructure

luis.calero@xxxxxxxxxxxxxx
T. +34 91 787 0000
C/ Apolonio Morales 13C - 28036 Madrid

letsrebold.com

El jue, 3 nov 2022 a las 11:48, Eugen Block (<eblock@xxxxxx>) escribió:
>
> Hi,
>
> you can use cephadm for that now [1]. To attach to a running daemon
> you run (run 'cephadm ls' to see all cephadm daemons):
>
> cephadm enter --name <DAEMON> [--fsid <FSID>]
>
> There you can query the daemon as you used to:
>
> storage01:~ # cephadm ls |grep mds
>          "name": "mds.cephfs.storage01.ozpeev",
>
> storage01:~ # cephadm enter --name mds.cephfs.storage01.ozpeev
> Inferring fsid 877636d0-d118-11ec-83c7-fa163e990a3e
> [ceph: root@storage01 /]# ceph daemon mds.cephfs.storage01.ozpeev ops
> {
>      "ops": [],
>      "num_ops": 0
> }
>
> You can still restart the daemons with systemctl:
>
> storage01:~ # systemctl restart
> ceph-877636d0-d118-11ec-83c7-fa163e990a3e@mds.cephfs.storage01.ozpeev.service
>
> Regards,
> Eugen
>
> [1] https://docs.ceph.com/en/latest/man/8/cephadm/?highlight=cephadm%20enter
>
> Zitat von Luis Calero Muñoz <luis.calero@xxxxxxxxxxxxxx>:
>
> > Hello, I'm running a ceph 15.2.15 Octopus cluster, and in preparation to
> > update it I've first transformed it to cephadm following the instructions
> > in the website. All went well but now i'm having a problem running "ceph
> > daemon mds.* dump_ops_in_flight" because it gives me an error:
> >
> > root@ceph-mds2:~# ceph -s |grep mds
> >     mds: cephfs:2
> > {0=cephfs.ceph-mds1.edwbhe=up:active,1=cephfs.ceph-mds2.cjpsjm=up:active} 2
> > up:standby
> >
> > root@ceph-mds2:~# ceph daemon mds.cephfs.ceph-mds2.cjpsjm
> >  dump_ops_in_flight
> >
> > admin_socket: exception getting command descriptions: [Errno 2] No such
> > file or directory
> >
> >   One thing i've noticed is that the name of the MDS daemons has changed,
> > before cephadm I could would refer them like mds.ceph-mds2 and now they're
> > called like mds.cephfs.ceph-mds2.cjpsjm,  where the last part is a random
> > string that changes when the daemon is restarted. Running an strace on the
> > ceph daemon command I've find out that the problem is that the command is
> > looking for a socket in a location that doesn't exist:
> >
> > root@ceph-mds2:~# strace ceph daemon mds.cephfs.ceph-mds2.cjpsjm
> >  dump_ops_in_flight
> > [...]
> > connect(3, {sa_family=AF_UNIX,
> > sun_path="/var/run/ceph/ceph-mds.cephfs.ceph-mds2.cjpsjm.asok"}, 53) = -1
> > ENOENT (No such file or directory)
> > write(2, "admin_socket: exception getting "..., 90admin_socket: exception
> > getting command descriptions: [Errno 2] No such file or directory
> >
> >
> >   Because the socket is actually in a folder inside /var/run/ceph:
> >
> > root@ceph-mds2:~# ls /var/run/ceph/
> >
> > d1fd0678-88c0-47fb-90da-e40a7a364442/
> >
> >
> > root@ceph-mds2:~# ls /var/run/ceph/d1fd0678-88c0-47fb-90da-e40a7a364442/
> >
> > ceph-mds.cephfs.ceph-mds2.cjpsjm.asok
> >
> >    So if I link the socket to
> > /var/run/ceph/ceph-mds.cephfs.ceph-mds2.cjpsjm.asok then the command runs
> > without problems. That would be a fix but I would need to make the link
> > every time the daemon restarts, so I think that something is not right here
> > and should work out of the box. What can I do?
> >
> >    Besides that, I've noticed that after updating to cephadm and docker I
> > can't restart the MDS servers with "service ceph-mds@ceph-mds1 restart"
> > anymore, what's the proper method to restart them now?
> >
> >   Regards.
> >
> >
> > --
> >   Luis
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx