Re: ceph orch status hangs forever

Eugen Block <eblock@xxxxxx> · Fri, 21 May 2021 06:18:17 +0000

Hi,

if you check

ceph mgr module ls | jq -r '.always_on_modules[]'

you'll see that crash, orchestrator and other modules are always on  
and can't be disabled. Without the pipe to jq you can see the whole  
list which is a bit long to get just an overview.
Anyway, comparing your enabled modules you have diskprediction_local  
enabled, can you disable it? Although I don't really expect this to be  
the root cause.

And what about the hanging cephadm shell sessions, can you see them on  
the hosts?

Do you see those shell sessions on the host(s)? I'm playing with a
pacific cluster and due to failing MONs I see a couple of lines like
these

Zitat von Sebastian Luna Valero <sebastian.luna.valero@xxxxxxxxx>:

Hi Eugen,

Here it is:
#  ceph mgr module ls | jq -r '.enabled_modules[]'
cephadm
dashboard
diskprediction_local
iostat
prometheus
restful

Should "crash" and "orchestrator" be part on the list? Why would have they
disappeared in the first place?

Best regards,
Sebastian

On Thu, 20 May 2021 at 15:54, Eugen Block <eblock@xxxxxx> wrote:

Which mgr modules are enabled? Can you share (if it responds):

ceph mgr module ls | jq -r '.enabled_modules[]'

> We have checked the call made from the container by checking DEBUG
> logs and I see that it is correct, in some commands work but others
> hang:

Do you see those shell sessions on the host(s)? I'm playing with a
pacific cluster and due to failing MONs I see a couple of lines like
these:

8684b2372083

docker.io/ceph/ceph@sha256:694ba9cdcbe6cb7d25ab14b34113c42c2d1af18d4c79c7ba4d1f62cf43d145fe
osd tree              20 minutes ago  Up 20 minutes ago
adoring_carver

Here the 'ceph osd tree' command didn't finish, so I stopped that pod.
Maybe that could help, at least worth a try.

Zitat von ManuParra <mparra@xxxxxx>:

> Hi Eugen thank you very much for your reply. I'm Manuel, a colleague
> of Sebastián.
>
> I complete what you ask us.
>
> We have checked more ceph commands, not only ceph crash and ceph org
> and many other commands are equally hung:
>
> [spsrc-mon-1 ~]# cephadm shell -- ceph pg stat
> hangs forever
> [spsrc-mon-1 ~]# cephadm shell -- ceph status
> Works
> [spsrc-mon-1 ~]# cephadm shell -- ceph progress
> hangs forever
> [spsrc-mon-1 ~]# cephadm shell -- ceph balancer status
> hangs forever
> [spsrc-mon-1 ~]# cephadm shell -- ceph crash ls
> hangs forever
> [spsrc-mon-1 ~]# cephadm shell -- ceph crash stat
> hangs forever
> [spsrc-mon-1 ~]# cephadm shell -- ceph telemetry status
> hangs forever
>
> We have checked the call made from the container by checking DEBUG
> logs and I see that it is correct, in some commands work but others
> hang:
>
> 2021-05-20 09:56:02,903 DEBUG Running command (timeout=None):
> /bin/docker run --rm --ipc=host --net=host --privileged
> --group-add=disk -e
> CONTAINER_IMAGE=172.16.3.146:4000/ceph/ceph:v15.2.9 -e
> NODE_NAME=spsrc-mon-1 -v
> /var/run/ceph/3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c:/var/run/ceph:z
> -v
> /var/log/ceph/3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c:/var/log/ceph:z
> -v
>
/var/lib/ceph/3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c/crash:/var/lib/ceph/crash:z
-v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
/run/lock/lvm:/run/lock/lvm -v
/var/lib/ceph/3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c/mon.spsrc-mon-1/config:/etc/ceph/ceph.conf:z
-v /etc/ceph/ceph.client.admin.keyring:/etc/ceph/ceph.keyring:z
--entrypoint ceph 172.16.3.146:4000/ceph/ceph:v15.2.9 pg
> stat
>
>  We have 3 monitor nodes and these are the containers that are
> running (on all monitor nodes):
>
> acf8870fc788   172.16.3.146:4000/ceph/ceph:v15.2.9
>                          "/usr/bin/ceph-mds -…"   7 days ago
> Up 7 days
> ceph-3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c-mds.manila.spsrc-mon-1.gpulzs
> cfac86f29db4   172.16.3.146:4000/ceph/ceph:v15.2.9
>                          "/usr/bin/ceph-mon -…"   7 days ago
> Up 7 days
> ceph-3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c-mon.spsrc-mon-1
> 4e6e600fa915   172.16.3.146:4000/ceph/ceph:v15.2.9
>                          "/usr/bin/ceph-crash…"   7 days ago
> Up 7 days
> ceph-3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c-crash.spsrc-mon-1
> dae36c48568e   172.16.3.146:4000/ceph/ceph:v15.2.9
>                          "/usr/bin/ceph-mgr -…"   7 days ago
> Up 7 days
> ceph-3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c-mgr.spsrc-mon-1.eziiam
>
> All with running status in all the 3 monitor nodes. As you see in
> this monitor, we have MDS, MON, CRASH and MGR.
>
> Any ideas what we can check?.
>
> Best regards,
> Manu
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx