Hi Andreas, On Wed, Jul 6, 2022 at 8:36 PM Andreas Teuchert <a.teuchert@xxxxxxxxxxxx> wrote: > > Hello Mathias and others, > > I also ran into this problem after upgrading from 16.2.9 to 17.2.1. > > Additionally I observed a health warning: "3 mgr modules have recently > crashed". > > Those are actually two distinct crashes that are already in the tracker: > > https://tracker.ceph.com/issues/56269 and > https://tracker.ceph.com/issues/56270 These trackers were filed under mgr component, hence we didn't pick it up during our (cephfs) weekly bug scrub. > > Considering that the crashes are in the snap_schedule module I assume > that they are the reason why the module is not available. > > I can reproduce the crash in 56270 by failing over the mgr. > > I believe that the faulty code causing the error is this line: > https://github.com/ceph/ceph/blob/v17.2.1/src/pybind/mgr/snap_schedule/fs/schedule_client.py#L193 > > Instead of ioctx.remove(SNAP_DB_OBJECT_NAME) it should be > ioctx.remove_object(SNAP_DB_OBJECT_NAME). I've asked one of the CephFS to have a look at the issue. > > (According to my understanding of > https://docs.ceph.com/en/latest/rados/api/python/.) > > Best regards, > > Andreas > > > On 01.07.22 18:05, Kuhring, Mathias wrote: > > Dear Ceph community, > > > > After upgrading our cluster to Quincy with cephadm (ceph orch upgrade start --image quay.io/ceph/ceph:v17.2.1), I struggle to re-activate the snapshot schedule module: > > > > 0|0[root@osd-1 ~]# ceph mgr module enable snap_schedule > > 0|1[root@osd-1 ~]# ceph mgr module ls | grep snap > > snap_schedule on > > > > 0|0[root@osd-1 ~]# ceph fs snap-schedule list / --recursive > > Error ENOENT: Module 'snap_schedule' is not available > > > > I tried restarting the MGR daemons and failed over a restarted one. But with no change. > > > > 0|0[root@osd-1 ~]# ceph orch restart mgr > > Scheduled to restart mgr.osd-1 on host 'osd-1' > > Scheduled to restart mgr.osd-2 on host 'osd-2' > > Scheduled to restart mgr.osd-3 on host 'osd-3' > > Scheduled to restart mgr.osd-4.oylrhe on host 'osd-4' > > Scheduled to restart mgr.osd-5.jcfyqe on host 'osd-5' > > > > 0|0[root@osd-1 ~]# ceph orch ps --daemon_type mgr > > NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID > > mgr.osd-1 osd-1 *:8443,9283 running (61s) 35s ago 9M 402M - 17.2.1 e5af760fa1c1 64f7ec70a6aa > > mgr.osd-2 osd-2 *:8443,9283 running (47s) 36s ago 9M 103M - 17.2.1 e5af760fa1c1 d25fdc793ff8 > > mgr.osd-3 osd-3 *:8443,9283 running (7h) 36s ago 9M 457M - 17.2.1 e5af760fa1c1 46d5091e50d6 > > mgr.osd-4.oylrhe osd-4 *:8443,9283 running (7h) 79s ago 9M 795M - 17.2.1 e5af760fa1c1 efb2a7cc06c5 > > mgr.osd-5.jcfyqe osd-5 *:8443,9283 running (8h) 37s ago 9M 448M - 17.2.1 e5af760fa1c1 96dd03817f32 > > > > 0|0[root@osd-1 ~]# ceph mgr fail > > > > The MGR confirms, that the snap_schedule module is not available: > > > > 0|0[root@osd-1 ~]# journalctl -eu ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@xxxxxxx-1.service<mailto:ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@xxxxxxx-1.service> > > > > Jul 01 16:25:49 osd-1 bash[662895]: debug 2022-07-01T14:25:49.825+0000 7f0486408700 0 log_channel(audit) log [DBG] : from='client.90801080 -' entity='client.admin' cmd=[{"prefix": "fs snap-schedule list", "path": "/", "recursive": true, "target": ["mon-mgr", ""]}]: dispatch > > Jul 01 16:25:49 osd-1 bash[662895]: debug 2022-07-01T14:25:49.825+0000 7f0486c09700 -1 mgr.server reply reply (2) No such file or directory Module 'snap_schedule' is not available > > > > But I'm not sure where the MGR is actually looking. The module path is: > > > > 0|22[root@osd-1 ~]# ceph config get mgr mgr_module_path > > /usr/share/ceph/mgr > > > > And while it is not available on the host (I assume these are just remnants from before our change to cephadm/docker, anyways): > > > > 0|0[root@osd-1 ~]# ll /usr/share/ceph/mgr > > ... > > drwxr-xr-x. 4 root root 144 22. Sep 2021 restful > > drwxr-xr-x. 3 root root 61 22. Sep 2021 selftest > > drwxr-xr-x. 3 root root 61 22. Sep 2021 status > > drwxr-xr-x. 3 root root 117 22. Sep 2021 telegraf > > ... > > > > The module is available in the MGR container (which I assume is where the MGR would look): > > > > 0|0[root@osd-1 ~]# docker exec -it ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f-mgr-osd-1 /bin/bash > > [root@osd-1 /]# ls -l /usr/share/ceph/mgr > > ... > > drwxr-xr-x. 4 root root 65 Jun 23 19:48 snap_schedule > > ... > > > > The module was available before on Pacific which was also cephadm deployed. > > Has anybody an idea how I can further investigate this? > > Thanks again for all you help! > > > > Best Wishes, > > Mathias > > > > > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- Cheers, Venky _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx