Re: [EXTERNAL] Re: Can't connect to MDS admin socket after updating to cephadm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



  If I look at the container name in docker it has the dots changed to
hyphens, but if I try to connect to the name with hyphens it doesn't
work either:

that is correct, that switch from dots to hyphens was introduced in pacific [1]. Can you share the content of the file for that container? Can you enter other containers that were changed? Maybe the conversion doesn't work as expected?


Zitat von Luis Calero Muñoz <luis.calero@xxxxxxxxxxxxxx>:

Hello Eugen, thanks for your answer. I was able to connect like you
showed me until I updated my cluster to ceph version 16.2.10
(pacific). But now it doesn't work anymore:

root@ceph-mds2:~# cephadm ls |grep ceph-mds | grep name
       "name": "mds.cephfs.ceph-mds2.cjpsjm",

root@ceph-mds2:~# cephadm enter --name mds.cephfs.ceph-mds2.cjpsjm
Inferring fsid d1fd0678-88c0-47fb-90da-e40a7a364442
Error: No such container:

  If I look at the container name in docker it has the dots changed to
hyphens, but if I try to connect to the name with hyphens it doesn't
work either:

root@ceph-mds2:~# docker ps
CONTAINER ID   IMAGE               COMMAND                  CREATED
         STATUS             PORTS     NAMES
38635f6de533   "/usr/bin/ceph-mds -…"   About an
hour ago   Up About an hour

root@ceph-mds2:~# cephadm enter --name mds-cephfs-ceph-mds2-cjpsjm
ERROR: must pass --fsid to specify cluster
root@ceph-mds2:~# cephadm enter --name mds-cephfs-ceph-mds2-cjpsjm
--fsid d1fd0678-88c0-47fb-90da-e40a7a364442
Traceback (most recent call last):
 File "/usr/sbin/cephadm", line 6158, in <module>
   r = args.func()
 File "/usr/sbin/cephadm", line 1309, in _infer_fsid
   return func()
 File "/usr/sbin/cephadm", line 3580, in command_enter
   (daemon_type, daemon_id) ='.', 1)
ValueError: not enough values to unpack (expected 2, got 1)

  What can I do?


Luis Calero Muñoz
Head of Infrastructure

T. +34 91 787 0000
C/ Apolonio Morales 13C - 28036 Madrid

El jue, 3 nov 2022 a las 11:48, Eugen Block (<eblock@xxxxxx>) escribió:


you can use cephadm for that now [1]. To attach to a running daemon
you run (run 'cephadm ls' to see all cephadm daemons):

cephadm enter --name <DAEMON> [--fsid <FSID>]

There you can query the daemon as you used to:

storage01:~ # cephadm ls |grep mds
         "name": "mds.cephfs.storage01.ozpeev",

storage01:~ # cephadm enter --name mds.cephfs.storage01.ozpeev
Inferring fsid 877636d0-d118-11ec-83c7-fa163e990a3e
[ceph: root@storage01 /]# ceph daemon mds.cephfs.storage01.ozpeev ops
     "ops": [],
     "num_ops": 0

You can still restart the daemons with systemctl:

storage01:~ # systemctl restart



Zitat von Luis Calero Muñoz <luis.calero@xxxxxxxxxxxxxx>:

> Hello, I'm running a ceph 15.2.15 Octopus cluster, and in preparation to
> update it I've first transformed it to cephadm following the instructions
> in the website. All went well but now i'm having a problem running "ceph
> daemon mds.* dump_ops_in_flight" because it gives me an error:
> root@ceph-mds2:~# ceph -s |grep mds
>     mds: cephfs:2
> {0=cephfs.ceph-mds1.edwbhe=up:active,1=cephfs.ceph-mds2.cjpsjm=up:active} 2
> up:standby
> root@ceph-mds2:~# ceph daemon mds.cephfs.ceph-mds2.cjpsjm
>  dump_ops_in_flight
> admin_socket: exception getting command descriptions: [Errno 2] No such
> file or directory
>   One thing i've noticed is that the name of the MDS daemons has changed,
> before cephadm I could would refer them like mds.ceph-mds2 and now they're
> called like mds.cephfs.ceph-mds2.cjpsjm,  where the last part is a random
> string that changes when the daemon is restarted. Running an strace on the
> ceph daemon command I've find out that the problem is that the command is
> looking for a socket in a location that doesn't exist:
> root@ceph-mds2:~# strace ceph daemon mds.cephfs.ceph-mds2.cjpsjm
>  dump_ops_in_flight
> [...]
> connect(3, {sa_family=AF_UNIX,
> sun_path="/var/run/ceph/ceph-mds.cephfs.ceph-mds2.cjpsjm.asok"}, 53) = -1
> ENOENT (No such file or directory)
> write(2, "admin_socket: exception getting "..., 90admin_socket: exception
> getting command descriptions: [Errno 2] No such file or directory
>   Because the socket is actually in a folder inside /var/run/ceph:
> root@ceph-mds2:~# ls /var/run/ceph/
> d1fd0678-88c0-47fb-90da-e40a7a364442/
> root@ceph-mds2:~# ls /var/run/ceph/d1fd0678-88c0-47fb-90da-e40a7a364442/
> ceph-mds.cephfs.ceph-mds2.cjpsjm.asok
>    So if I link the socket to
> /var/run/ceph/ceph-mds.cephfs.ceph-mds2.cjpsjm.asok then the command runs
> without problems. That would be a fix but I would need to make the link
> every time the daemon restarts, so I think that something is not right here
> and should work out of the box. What can I do?
>    Besides that, I've noticed that after updating to cephadm and docker I
> can't restart the MDS servers with "service ceph-mds@ceph-mds1 restart"
> anymore, what's the proper method to restart them now?
>   Regards.
> --
>   Luis
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]

  Powered by Linux