Re: snap_schedule MGR module not available after upgrade to Quincy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andreas,

On Wed, Jul 6, 2022 at 8:36 PM Andreas Teuchert <a.teuchert@xxxxxxxxxxxx> wrote:
>
> Hello Mathias and others,
>
> I also ran into this problem after upgrading from 16.2.9 to 17.2.1.
>
> Additionally I observed a health warning: "3 mgr modules have recently
> crashed".
>
> Those are actually two distinct crashes that are already in the tracker:
>
> https://tracker.ceph.com/issues/56269 and
> https://tracker.ceph.com/issues/56270

These trackers were filed under mgr component, hence we didn't pick it
up during our (cephfs) weekly bug scrub.

>
> Considering that the crashes are in the snap_schedule module I assume
> that they are the reason why the module is not available.
>
> I can reproduce the crash in 56270 by failing over the mgr.
>
> I believe that the faulty code causing the error is this line:
> https://github.com/ceph/ceph/blob/v17.2.1/src/pybind/mgr/snap_schedule/fs/schedule_client.py#L193
>
> Instead of ioctx.remove(SNAP_DB_OBJECT_NAME) it should be
> ioctx.remove_object(SNAP_DB_OBJECT_NAME).

I've asked one of the CephFS to have a look at the issue.

>
> (According to my understanding of
> https://docs.ceph.com/en/latest/rados/api/python/.)
>
> Best regards,
>
> Andreas
>
>
> On 01.07.22 18:05, Kuhring, Mathias wrote:
> > Dear Ceph community,
> >
> > After upgrading our cluster to Quincy with cephadm (ceph orch upgrade start --image quay.io/ceph/ceph:v17.2.1), I struggle to re-activate the snapshot schedule module:
> >
> > 0|0[root@osd-1 ~]# ceph mgr module enable snap_schedule
> > 0|1[root@osd-1 ~]# ceph mgr module ls | grep snap
> > snap_schedule         on
> >
> > 0|0[root@osd-1 ~]# ceph fs snap-schedule list / --recursive
> > Error ENOENT: Module 'snap_schedule' is not available
> >
> > I tried restarting the MGR daemons and failed over a restarted one. But with no change.
> >
> > 0|0[root@osd-1 ~]# ceph orch restart mgr
> > Scheduled to restart mgr.osd-1 on host 'osd-1'
> > Scheduled to restart mgr.osd-2 on host 'osd-2'
> > Scheduled to restart mgr.osd-3 on host 'osd-3'
> > Scheduled to restart mgr.osd-4.oylrhe on host 'osd-4'
> > Scheduled to restart mgr.osd-5.jcfyqe on host 'osd-5'
> >
> > 0|0[root@osd-1 ~]# ceph orch ps --daemon_type mgr
> > NAME              HOST   PORTS        STATUS         REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID
> > mgr.osd-1         osd-1  *:8443,9283  running (61s)    35s ago   9M     402M        -  17.2.1   e5af760fa1c1  64f7ec70a6aa
> > mgr.osd-2         osd-2  *:8443,9283  running (47s)    36s ago   9M     103M        -  17.2.1   e5af760fa1c1  d25fdc793ff8
> > mgr.osd-3         osd-3  *:8443,9283  running (7h)     36s ago   9M     457M        -  17.2.1   e5af760fa1c1  46d5091e50d6
> > mgr.osd-4.oylrhe  osd-4  *:8443,9283  running (7h)     79s ago   9M     795M        -  17.2.1   e5af760fa1c1  efb2a7cc06c5
> > mgr.osd-5.jcfyqe  osd-5  *:8443,9283  running (8h)     37s ago   9M     448M        -  17.2.1   e5af760fa1c1  96dd03817f32
> >
> > 0|0[root@osd-1 ~]# ceph mgr fail
> >
> > The MGR confirms, that the snap_schedule module is not available:
> >
> > 0|0[root@osd-1 ~]# journalctl -eu ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@xxxxxxx-1.service<mailto:ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@xxxxxxx-1.service>
> >
> > Jul 01 16:25:49 osd-1 bash[662895]: debug 2022-07-01T14:25:49.825+0000 7f0486408700  0 log_channel(audit) log [DBG] : from='client.90801080 -' entity='client.admin' cmd=[{"prefix": "fs snap-schedule list", "path": "/", "recursive": true, "target": ["mon-mgr", ""]}]: dispatch
> > Jul 01 16:25:49 osd-1 bash[662895]: debug 2022-07-01T14:25:49.825+0000 7f0486c09700 -1 mgr.server reply reply (2) No such file or directory Module 'snap_schedule' is not available
> >
> > But I'm not sure where the MGR is actually looking. The module path is:
> >
> > 0|22[root@osd-1 ~]# ceph config get mgr mgr_module_path
> > /usr/share/ceph/mgr
> >
> > And while it is not available on the host (I assume these are just remnants from before our change to cephadm/docker, anyways):
> >
> > 0|0[root@osd-1 ~]# ll /usr/share/ceph/mgr
> > ...
> > drwxr-xr-x. 4 root root   144 22. Sep 2021  restful
> > drwxr-xr-x. 3 root root    61 22. Sep 2021  selftest
> > drwxr-xr-x. 3 root root    61 22. Sep 2021  status
> > drwxr-xr-x. 3 root root   117 22. Sep 2021  telegraf
> > ...
> >
> > The module is available in the MGR container (which I assume is where the MGR would look):
> >
> > 0|0[root@osd-1 ~]# docker exec -it ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f-mgr-osd-1 /bin/bash
> > [root@osd-1 /]# ls -l /usr/share/ceph/mgr
> > ...
> > drwxr-xr-x.  4 root root    65 Jun 23 19:48 snap_schedule
> > ...
> >
> > The module was available before on Pacific which was also cephadm deployed.
> > Has anybody an idea how I can further investigate this?
> > Thanks again for all you help!
> >
> > Best Wishes,
> > Mathias
> >
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>


-- 
Cheers,
Venky

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux