Hey Sebastian, On Thu, Jan 27, 2022 at 6:06 AM Sebastian Mazza <sebastian@xxxxxxxxxxx> wrote: > > I have a problem with the snap_schedule MGR module. It seams to forget at least parts of the configuration after the active MGR is restarted. > The following cli commands (lines starting with ‘$’) and their std out (lines starting with >) demonstrates the problem. > > $ ceph fs snap-schedule add /shares/users 1h 2021-10-31T18:00 > > Schedule set for path /shares/users > > $ ceph fs snap-schedule retention add /shares/users 14h10d12m > > Retention added to path /shares/users > > Wait until the next complete hour. > > $ ceph fs snap-schedule status /shares/users > > {"fs": "cephfs", "subvol": null, "path": "/shares/users", "rel_path": "/shares/users", "schedule": "1h", "retention": {"h": 14, "d": 10, "m": 12}, "start": "2021-10-31T18:00:00", "created": "2022-01-26T23:52:03", "first": "2022-01-27T00:00:00", "last": "2022-01-27T00:00:00", "last_pruned": "2022-01-27T00:00:00", "created_count": 1, "pruned_count": 1, "active": true} > > Now everything looks and works as expected. However, if I restart the active MGR, no new snapshots will be created and the status command does unexpectedly report NULL for some of the properties. > > $ systemctl restart ceph-mgr@apollon.service > > $ ceph fs snap-schedule status /shares/users > > {"fs": "cephfs", "subvol": null, "path": "/shares/users", "rel_path": "/shares/users", "schedule": "1h", "retention": {}, "start": "2021-10-31T18:00:00", "created": "2022-01-26T23:52:03", "first": null, "last": null, "last_pruned": null, "created_count": 0, "pruned_count": 0, "active": true} That looks like a bug. Another similar issue is reported here: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/7K4T2HI72NJPB6UWEMZAYEUN4MORBL6O/ Could you please file a tracker here: https://tracker.ceph.com/projects/cephfs/issues/new It would help if you could enable debug log for ceph-mgr, repeat the steps you mention above and upload the log in the tracker. > > > I did look into the source file mgr/snap_schedule/fs/schedule.py. Since, I never used python I do not understand much, but I understand the SQL code that is given. > Therefore, I did save the sqlight DB dump before and after a MGR restart by the following commands: > > List RADOS objects in order to find the sqlight DB dump: > $ rados --pool fs.metadata-root-pool --namespace cephfs-snap-schedule ls > > snap_db_v0 > > Copy the sqlight DB dump into a regular file > $ rados --pool fs.metadata-root-pool --namespace cephfs-snap-schedule get snap_db_v0 /tmp/snap_db_v0 > > To my surprise, the sqlight DB dump never contains the information for retention, first, last, and last_pruned. > The sqlight DB dump always looks like this: > ———————————————— > BEGIN TRANSACTION; > CREATE TABLE schedules( > id INTEGER PRIMARY KEY ASC, > path TEXT NOT NULL UNIQUE, > subvol TEXT, > retention TEXT DEFAULT '{}', > rel_path TEXT NOT NULL > ); > INSERT INTO "schedules" VALUES(2,'/shares/groups',NULL,'{}','/shares/groups'); > INSERT INTO "schedules" VALUES(3,'/shares/backup-clients',NULL,'{}','/shares/backup-clients'); > INSERT INTO "schedules" VALUES(4,'/shares/users',NULL,'{}','/shares/users'); > CREATE TABLE schedules_meta( > id INTEGER PRIMARY KEY ASC, > schedule_id INT, > start TEXT NOT NULL, > first TEXT, > last TEXT, > last_pruned TEXT, > created TEXT NOT NULL, > repeat INT NOT NULL, > schedule TEXT NOT NULL, > created_count INT DEFAULT 0, > pruned_count INT DEFAULT 0, > active INT NOT NULL, > FOREIGN KEY(schedule_id) REFERENCES schedules(id) ON DELETE CASCADE, > UNIQUE (schedule_id, start, repeat) > ); > INSERT INTO "schedules_meta" VALUES(2,2,'2021-10-31T18:00:00',NULL,NULL,NULL,'2022-01-21T11:41:35',3600,'1h',0,0,1); > INSERT INTO "schedules_meta" VALUES(3,3,'2021-10-31T13:30:00',NULL,NULL,NULL,'2022-01-21T11:41:41',21600,'6h',0,0,1); > INSERT INTO "schedules_meta" VALUES(4,4,'2021-10-31T18:00:00',NULL,NULL,NULL,'2022-01-26T23:52:03',3600,'1h',0,0,1); > COMMIT; > ———————————————— > > Why are the information about retention, first, last, and last_pruned are not part of the sqlight dump? I expect it to be a part of the above query. Most likely it's a bug, > Is this the reason why my snapshot scheduling stops working after the active MGR is restarted? > > > My ceph version is: 16.2.6 > > > Thanks is advance, > Sebastian > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Cheers, Venky _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx