CephFS Snapshot Scheduling stops creating Snapshots after a restart of the Manager

Sebastian Mazza <sebastian@xxxxxxxxxxx> · Thu, 27 Jan 2022 01:35:53 +0100

I have a problem with the snap_schedule MGR module. It seams to forget at least parts of the configuration after the active MGR is restarted.
The following cli commands (lines starting with ‘$’) and their std out (lines starting with >) demonstrates the problem.

$ ceph fs snap-schedule add /shares/users 1h 2021-10-31T18:00
> Schedule set for path /shares/users

$ ceph fs snap-schedule retention add /shares/users 14h10d12m 
> Retention added to path /shares/users

Wait until the next complete hour.

$ ceph fs snap-schedule status /shares/users
> {"fs": "cephfs", "subvol": null, "path": "/shares/users", "rel_path": "/shares/users", "schedule": "1h", "retention": {"h": 14, "d": 10, "m": 12}, "start": "2021-10-31T18:00:00", "created": "2022-01-26T23:52:03", "first": "2022-01-27T00:00:00", "last": "2022-01-27T00:00:00", "last_pruned": "2022-01-27T00:00:00", "created_count": 1, "pruned_count": 1, "active": true}

Now everything looks and works as expected. However, if I restart the active MGR, no new snapshots will be created and the status command does unexpectedly report NULL for some of the properties. 

$ systemctl restart ceph-mgr@apollon.service

$ ceph fs snap-schedule status /shares/users
> {"fs": "cephfs", "subvol": null, "path": "/shares/users", "rel_path": "/shares/users", "schedule": "1h", "retention": {}, "start": "2021-10-31T18:00:00", "created": "2022-01-26T23:52:03", "first": null, "last": null, "last_pruned": null, "created_count": 0, "pruned_count": 0, "active": true}

I did look into the source file mgr/snap_schedule/fs/schedule.py. Since, I never used python I do not understand much, but I understand the SQL code that is given.
Therefore, I did save the sqlight DB dump before and after a MGR restart by the following commands:

List RADOS objects in order to find the sqlight DB dump:
$ rados --pool fs.metadata-root-pool --namespace cephfs-snap-schedule ls 
> snap_db_v0

Copy the sqlight DB dump into a regular file
$ rados --pool fs.metadata-root-pool --namespace cephfs-snap-schedule get snap_db_v0 /tmp/snap_db_v0

To my surprise, the sqlight DB dump never contains the information for retention, first, last, and last_pruned.
The sqlight DB dump always looks like this:
————————————————
BEGIN TRANSACTION;
CREATE TABLE schedules(
        id INTEGER PRIMARY KEY ASC,
        path TEXT NOT NULL UNIQUE,
        subvol TEXT,
        retention TEXT DEFAULT '{}',
        rel_path TEXT NOT NULL
    );
INSERT INTO "schedules" VALUES(2,'/shares/groups',NULL,'{}','/shares/groups');
INSERT INTO "schedules" VALUES(3,'/shares/backup-clients',NULL,'{}','/shares/backup-clients');
INSERT INTO "schedules" VALUES(4,'/shares/users',NULL,'{}','/shares/users');
CREATE TABLE schedules_meta(
        id INTEGER PRIMARY KEY ASC,
        schedule_id INT,
        start TEXT NOT NULL,
        first TEXT,
        last TEXT,
        last_pruned TEXT,
        created TEXT NOT NULL,
        repeat INT NOT NULL,
        schedule TEXT NOT NULL,
        created_count INT DEFAULT 0,
        pruned_count INT DEFAULT 0,
        active INT NOT NULL,
        FOREIGN KEY(schedule_id) REFERENCES schedules(id) ON DELETE CASCADE,
        UNIQUE (schedule_id, start, repeat)
    );
INSERT INTO "schedules_meta" VALUES(2,2,'2021-10-31T18:00:00',NULL,NULL,NULL,'2022-01-21T11:41:35',3600,'1h',0,0,1);
INSERT INTO "schedules_meta" VALUES(3,3,'2021-10-31T13:30:00',NULL,NULL,NULL,'2022-01-21T11:41:41',21600,'6h',0,0,1);
INSERT INTO "schedules_meta" VALUES(4,4,'2021-10-31T18:00:00',NULL,NULL,NULL,'2022-01-26T23:52:03',3600,'1h',0,0,1);
COMMIT;
————————————————

Why are the information about retention, first, last, and last_pruned are not part of the sqlight dump?
Is this the reason why my snapshot scheduling stops working after the active MGR is restarted?

My ceph version is: 16.2.6

Thanks is advance,
Sebastian 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx