Hi,
if you check
ceph mgr module ls | jq -r '.always_on_modules[]'
you'll see that crash, orchestrator and other modules are always on
and can't be disabled. Without the pipe to jq you can see the whole
list which is a bit long to get just an overview.
Anyway, comparing your enabled modules you have diskprediction_local
enabled, can you disable it? Although I don't really expect this to be
the root cause.
And what about the hanging cephadm shell sessions, can you see them on
the hosts?
Do you see those shell sessions on the host(s)? I'm playing with a
pacific cluster and due to failing MONs I see a couple of lines like
these
Zitat von Sebastian Luna Valero <sebastian.luna.valero@xxxxxxxxx>:
Hi Eugen,
Here it is:
# ceph mgr module ls | jq -r '.enabled_modules[]'
cephadm
dashboard
diskprediction_local
iostat
prometheus
restful
Should "crash" and "orchestrator" be part on the list? Why would have they
disappeared in the first place?
Best regards,
Sebastian
On Thu, 20 May 2021 at 15:54, Eugen Block <eblock@xxxxxx> wrote:
Which mgr modules are enabled? Can you share (if it responds):
ceph mgr module ls | jq -r '.enabled_modules[]'
> We have checked the call made from the container by checking DEBUG
> logs and I see that it is correct, in some commands work but others
> hang:
Do you see those shell sessions on the host(s)? I'm playing with a
pacific cluster and due to failing MONs I see a couple of lines like
these:
8684b2372083
docker.io/ceph/ceph@sha256:694ba9cdcbe6cb7d25ab14b34113c42c2d1af18d4c79c7ba4d1f62cf43d145fe
osd tree 20 minutes ago Up 20 minutes ago
adoring_carver
Here the 'ceph osd tree' command didn't finish, so I stopped that pod.
Maybe that could help, at least worth a try.
Zitat von ManuParra <mparra@xxxxxx>:
> Hi Eugen thank you very much for your reply. I'm Manuel, a colleague
> of Sebastián.
>
> I complete what you ask us.
>
> We have checked more ceph commands, not only ceph crash and ceph org
> and many other commands are equally hung:
>
> [spsrc-mon-1 ~]# cephadm shell -- ceph pg stat
> hangs forever
> [spsrc-mon-1 ~]# cephadm shell -- ceph status
> Works
> [spsrc-mon-1 ~]# cephadm shell -- ceph progress
> hangs forever
> [spsrc-mon-1 ~]# cephadm shell -- ceph balancer status
> hangs forever
> [spsrc-mon-1 ~]# cephadm shell -- ceph crash ls
> hangs forever
> [spsrc-mon-1 ~]# cephadm shell -- ceph crash stat
> hangs forever
> [spsrc-mon-1 ~]# cephadm shell -- ceph telemetry status
> hangs forever
>
> We have checked the call made from the container by checking DEBUG
> logs and I see that it is correct, in some commands work but others
> hang:
>
> 2021-05-20 09:56:02,903 DEBUG Running command (timeout=None):
> /bin/docker run --rm --ipc=host --net=host --privileged
> --group-add=disk -e
> CONTAINER_IMAGE=172.16.3.146:4000/ceph/ceph:v15.2.9 -e
> NODE_NAME=spsrc-mon-1 -v
> /var/run/ceph/3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c:/var/run/ceph:z
> -v
> /var/log/ceph/3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c:/var/log/ceph:z
> -v
>
/var/lib/ceph/3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c/crash:/var/lib/ceph/crash:z
-v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
/run/lock/lvm:/run/lock/lvm -v
/var/lib/ceph/3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c/mon.spsrc-mon-1/config:/etc/ceph/ceph.conf:z
-v /etc/ceph/ceph.client.admin.keyring:/etc/ceph/ceph.keyring:z
--entrypoint ceph 172.16.3.146:4000/ceph/ceph:v15.2.9 pg
> stat
>
> We have 3 monitor nodes and these are the containers that are
> running (on all monitor nodes):
>
> acf8870fc788 172.16.3.146:4000/ceph/ceph:v15.2.9
> "/usr/bin/ceph-mds -…" 7 days ago
> Up 7 days
> ceph-3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c-mds.manila.spsrc-mon-1.gpulzs
> cfac86f29db4 172.16.3.146:4000/ceph/ceph:v15.2.9
> "/usr/bin/ceph-mon -…" 7 days ago
> Up 7 days
> ceph-3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c-mon.spsrc-mon-1
> 4e6e600fa915 172.16.3.146:4000/ceph/ceph:v15.2.9
> "/usr/bin/ceph-crash…" 7 days ago
> Up 7 days
> ceph-3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c-crash.spsrc-mon-1
> dae36c48568e 172.16.3.146:4000/ceph/ceph:v15.2.9
> "/usr/bin/ceph-mgr -…" 7 days ago
> Up 7 days
> ceph-3cdbf59a-a74b-11ea-93cc-f0d4e2e6643c-mgr.spsrc-mon-1.eziiam
>
> All with running status in all the 3 monitor nodes. As you see in
> this monitor, we have MDS, MON, CRASH and MGR.
>
> Any ideas what we can check?.
>
> Best regards,
> Manu
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx