Re: snap_schedule works after 1 hour of scheduling

Kushagr Gupta <kushagrguptasps.mun@xxxxxxxxx> · Thu, 5 Oct 2023 11:16:34 +0530

Hi Milind,Team

Thank you for your response @Milind Changire <mchangir@xxxxxxxxxx>

>>The only thing I can think of is a stale mgr that wasn't restarted
>>after an upgrade.
>>Was an upgrade performed lately ?

Yes an upgrade was performed after which we faced this. But we were facing
this issue previously as well.
Another interesting thing which we observed was that even after the
upgrade, the schedules that we created  before upgrade were still running.

But to eliminate this, I installed a fresh cluster after purging the old
one.
Commands used for as follows:
```
ansible-playbook -i hosts infrastructure-playbooks/purge-cluster.yml
ansible-playbook -i hosts site.yml
```

After this kindly note the commands which we followed:
```
[root@storagenode-1 ~]# ceph mgr module  enable snap_schedule
[root@storagenode-1 ~]# ceph config set mgr mgr/snap_schedule/log_level
debug
[root@storagenode-1 ~]# sudo ceph fs subvolumegroup create cephfs subvolgrp
[root@storagenode-1 ~]# ceph fs subvolume create cephfs test subvolgrp
[root@storagenode-1 ~]# date
Thu Oct  5 04:23:09 UTC 2023
[root@storagenode-1 ~]# ceph fs snap-schedule add /volumes/subvolgrp/test
1h 2023-10-05T04:30:00
Schedule set for path /volumes/subvolgrp/test
[root@storagenode-1 ~]#  ceph fs snap-schedule list / --recursive=true
/volumes/subvolgrp/test 1h
[root@storagenode-1 ~]# ceph fs snap-schedule status /volumes/subvolgrp/test
{"fs": "cephfs", "subvol": null, "path": "/volumes/subvolgrp/test",
"rel_path": "/volumes/subvolgrp/test", "schedule": "1h", "retention": {},
"start": "2023-10-05T04:30:00", "created": "2023-10-05T04:23:39", "first":
null, "last": null, "last_pruned": null, "created_count": 0,
"pruned_count": 0, "active": true}
[root@storagenode-1 ~]# ceph fs subvolume info cephfs test subvolgrp
{
    "atime": "2023-10-05 04:20:18",
    "bytes_pcent": "undefined",
    "bytes_quota": "infinite",
    "bytes_used": 0,
    "created_at": "2023-10-05 04:20:18",
    "ctime": "2023-10-05 04:20:18",
    "data_pool": "cephfs_data",
    "features": [
        "snapshot-clone",
        "snapshot-autoprotect",
        "snapshot-retention"
    ],
    "gid": 0,
    "mode": 16877,
    "mon_addrs": [
        "[abcd:abcd:abcd::34]:6789",
        "[abcd:abcd:abcd::35]:6789",
        "[abcd:abcd:abcd::36]:6789"
    ],
    "mtime": "2023-10-05 04:20:18",
    "path": "/volumes/subvolgrp/test/73d82b1a-6fb1-4160-a388-66b898967a85",
    "pool_namespace": "",
    "state": "complete",
    "type": "subvolume",
    "uid": 0
}
[root@storagenode-1 ~]#
[root@storagenode-1 ~]# ceph fs snap-schedule status /volumes/subvolgrp/test
{"fs": "cephfs", "subvol": null, "path": "/volumes/subvolgrp/test",
"rel_path": "/volumes/subvolgrp/test", "schedule": "1h", "retention": {"h":
4}, "start": "2023-10-05T04:30:00", "created": "2023-10-05T04:23:39",
"first": null, "last": null, "last_pruned": null, "created_count": 0,
"pruned_count": 0, "active": true}
[root@storagenode-1 ~]# date
Thu Oct  5 05:31:20 UTC 2023
[root@storagenode-1 ~]#
```

Could you please help us. Are we doing something wrong? Because still the
schedules are not getting created.

Thanks and Regards,
Kushagra Gupta

On Wed, Oct 4, 2023 at 9:33 PM Milind Changire <mchangir@xxxxxxxxxx> wrote:

> On Wed, Oct 4, 2023 at 7:19 PM Kushagr Gupta
> <kushagrguptasps.mun@xxxxxxxxx> wrote:
> >
> > Hi Milind,
> >
> > Thank you for your swift response.
> >
> > >>How many hours did you wait after the "start time" and decide to
> restart mgr ?
> > We waited for ~3 days before restarting the mgr-service.
>
> The only thing I can think of is a stale mgr that wasn't restarted
> after an upgrade.
> Was an upgrade performed lately ?
>
> Did the dir exist at the time the snapshot was scheduled to take place.
> If it didn't then the schedule gets disabled until explicitly enabled.
>
> >
> > There was one more instance where we waited for 2 hours and then
> re-started and in the third hour the schedule started working.
> >
> > Could you please guide us if we are doing anything wrong.
> > Kindly let us know if any logs are required.
> >
> > Thanks and Regards,
> > Kushagra Gupta
> >
> > On Wed, Oct 4, 2023 at 5:39 PM Milind Changire <mchangir@xxxxxxxxxx>
> wrote:
> >>
> >> On Wed, Oct 4, 2023 at 3:40 PM Kushagr Gupta
> >> <kushagrguptasps.mun@xxxxxxxxx> wrote:
> >> >
> >> > Hi Team,Milind
> >> >
> >> > Ceph-version: Quincy, Reef
> >> > OS: Almalinux 8
> >> >
> >> > Issue: snap_schedule works after 1 hour of schedule
> >> >
> >> > Description:
> >> >
> >> > We are currently working in a 3-node ceph cluster.
> >> > We are currently exploring the scheduled snapshot capability of the
> ceph-mgr module.
> >> > To enable/configure scheduled snapshots, we followed the following
> link:
> >> >
> >> >
> >> >
> >> > https://docs.ceph.com/en/quincy/cephfs/snap-schedule/
> >> >
> >> >
> >> >
> >> > We were able to create snap schedules for the subvolumes as suggested.
> >> > But we have observed a two very strange behaviour:
> >> > 1. The snap_schedules only work when we restart the ceph-mgr service
> on the mgr node:
> >> > We then restarted the mgr-service on the active mgr node, and after 1
> hour it started getting created. I am attaching the log file for the same
> after restart. Thre behaviour looks abnormal.
> >>
> >> A mgr restart is not required for the schedule to get triggered.
> >> How many hours did you wait after the "start time" and decide to
> restart mgr ?
> >>
> >> >
> >> > So,  for eg consider the below output:
> >> > ```
> >> > [root@storagenode-1 ~]# ceph fs snap-schedule status
> /volumes/subvolgrp/test3
> >> > {"fs": "cephfs", "subvol": null, "path": "/volumes/subvolgrp/test3",
> "rel_path": "/volumes/subvolgrp/test3", "schedule": "1h", "retention": {},
> "start": "2023-10-04T07:20:00", "created": "2023-10-04T07:18:41", "first":
> "2023-10-04T08:20:00", "last": "2023-10-04T09:20:00", "last_pruned": null,
> "created_count": 2, "pruned_count": 0, "active": true}
> >> > [root@storagenode-1 ~]#
> >> > ```
> >> > As we can see in the above o/p, we created the schedule at
> 2023-10-04T07:18:41. The schedule was suppose to start at
> 2023-10-04T07:20:00 but it started at 2023-10-04T08:20:00
> >>
> >> seems normal behavior to me
> >> the schedule starts countdown for 1h from 2023-10-04T07:20:00 and
> >> created first snapshot at 2023-10-04T08:20:00
> >>
> >> >
> >> > Any input w.r.t the same will be of great help.
> >> >
> >> > Thanks and Regards
> >> > Kushagra Gupta
> >>
> >>
> >>
> >> --
> >> Milind
> >>
>
>
> --
> Milind
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx