On Thu, Nov 17, 2022 at 6:02 PM phandaal <phandaal@xxxxxxxxxxxx> wrote: > On 2022-11-17 12:58, Milind Changire wrote: > > Christian, > > Some obvious questions ... > > > > 1. What Linux distribution have you deployed Ceph on ? > > Gentoo Linux, using python 3.10. > Ceph is only used for CephFS, data pool using EC8+3 on spinners, > metadata using replication on SSDs. > > > 2. The snap_schedule db has indeed been moved to an SQLite DB in > > rados > > in Quincy. > > So, is there ample storage space in your metadata pool to move this > > DB > > to ? > > Should be : > > # ceph df > --- RAW STORAGE --- > CLASS SIZE AVAIL USED RAW USED %RAW USED > hdd 160 TiB 46 TiB 114 TiB 114 TiB 71.20 > ssd 3.5 TiB 3.5 TiB 15 GiB 15 GiB 0.42 > TOTAL 164 TiB 50 TiB 114 TiB 114 TiB 69.69 > > --- POOLS --- > POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL > .mgr 1 1 195 MiB 33 585 MiB 0.02 1.1 TiB > cephfs-metadata 4 32 3.8 GiB 415.03k 11 GiB 0.33 1.1 TiB > cephfs-data 5 128 83 TiB 26.75M 113 TiB 77.77 24 TiB > > 1 TiB should be enough to store some snap schedules... > I suppose the snap_schedule module doesn't find the sqlite rados object. > Is there a way I can verify its existence (and create it if needed) ? > > The error arrives when trying to restart old schedules > (schedule_client.py line 169) and trying to find the old store, which > does not exist, the schedules have been created in Pacific. Can I just > wipe them out to recreate the schedules from scratch ? > The error does show up when trying to restart old schedules, but the ioctx.stat() at line schedule_client.py:201 should've thrown a rados.ObjectNotFound exception and got caught at line 205. Which doesn't seem to be the case. Which implies that the backing rados object for the DB dump was found, but there was a libcephsqlite error as per your original email: 2022-11-17T09:50:25.769+0100 7f7be20db6c0 -1 cephsqlite: Open: (client.444215) cannot open temporary database Hence the error at line 203: db.executescript(dump) You could try cleaning up the old db dumps and restart the cluster to start with a clean slate. But, I'd recommend you to backup the db dump object. The db dump object in your metadata pool should be named snap_db_v0 So, you should rename all snap_db_v0 to snap_db_v0.orig for all file-systems. After saying all this, I wouldn't recommend you do this at all. The problem seems to be due to these missing bits: pybind/mgr: use memory temp_store #48449 <https://github.com/ceph/ceph/pull/48449> Unfortunately, integration testing is stalled due to infrastructure problems. However, things are returning back to normal and this will get into a release at the earliest. > Christian. > > > > > > > > > On Thu, Nov 17, 2022 at 2:53 PM phandaal <phandaal@xxxxxxxxxxxx> wrote: > > > >> Hi all, > >> > >> After upgrading from 16.2.10 to 17.2.5, the snap_schedule dashboard > >> module does not start anymore (everything else is just fine). > >> I had snap scheduled with this module in my cephfs, working perfectly > >> on > >> 16.2.10, but I couldn't find them anymore after upgrade, dut to the > >> module being unavailable : > >> # ceph fs snap-schedule status > >> Error ENOENT: Module 'snap_schedule' is not available > >> > >> In the mgr startup logs i can find an error related to the sqlite > >> database containing the schedules : > >> > >> 2022-11-17T09:50:23.489+0100 7f7bbfc976c0 0 [dashboard INFO request] > >> [192.168.69.20:8696] [GET] [200] [0.011s] [phandaal] [107.0B] > >> /ceph/api/mgr/module/snap_schedule > >> 2022-11-17T09:50:23.499+0100 7f7be20db6c0 -1 client.444215: > >> SimpleRADOSStriper: lock: snap_db_v0.db: waiting for locks: lockers > >> exclusive=1 tag= > >> lockers=[client.444152:35ac7693-032d-47a8-9d5c-4b71291a8158:v1: > >> 192.168.69.20:0/937503739] > >> 2022-11-17T09:50:24.189+0100 7f7be20db6c0 -1 client.444215: > >> SimpleRADOSStriper: lock: snap_db_v0.db: waiting for locks: lockers > >> exclusive=1 tag= > >> lockers=[client.444152:35ac7693-032d-47a8-9d5c-4b71291a8158:v1: > >> 192.168.69.20:0/937503739] > >> 2022-11-17T09:50:24.859+0100 7f7be20db6c0 -1 client.444215: > >> SimpleRADOSStriper: lock: snap_db_v0.db: waiting for locks: lockers > >> exclusive=1 tag= > >> lockers=[client.444152:35ac7693-032d-47a8-9d5c-4b71291a8158:v1: > >> 192.168.69.20:0/937503739] > >> 2022-11-17T09:50:25.769+0100 7f7be20db6c0 -1 cephsqlite: Open: > >> (client.444215) cannot open temporary database > >> 2022-11-17T09:50:25.769+0100 7f7be20db6c0 -1 mgr load Failed to > >> construct class in 'snap_schedule' > >> 2022-11-17T09:50:25.769+0100 7f7be20db6c0 -1 mgr load Traceback (most > >> recent call last): > >> File "/usr/share/ceph/mgr/snap_schedule/module.py", line 38, in > >> __init__ > >> self.client = SnapSchedClient(self) > >> File "/usr/share/ceph/mgr/snap_schedule/fs/schedule_client.py", > >> line > >> 169, in __init__ > >> with self.get_schedule_db(fs_name) as conn_mgr: > >> File "/usr/share/ceph/mgr/snap_schedule/fs/schedule_client.py", > >> line > >> 203, in get_schedule_db > >> db.executescript(dump) > >> sqlite3.OperationalError: unable to open database file > >> > >> 2022-11-17T09:50:25.769+0100 7f7be20db6c0 -1 mgr operator() Failed to > >> run module in active mode ('snap_schedule') > >> > >> I think the snap_schedule database has been moved into rados in > >> Quincy, > >> it there any way to manually create the database (empty) ? > >> > >> Regards, > >> Christian. > >> > >> -- > >> Christian Vilhelm : phandaal@xxxxxxxxxxxx > >> Reality is for people who lack imagination > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > >> > > -- > Christian Vilhelm : phandaal@xxxxxxxxxxxx > Reality is for people who lack imagination > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > -- Milind _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx