Re: snap_schedule works after 1 hour of scheduling

Kushagr Gupta <kushagrguptasps.mun@xxxxxxxxx> · Thu, 5 Oct 2023 13:44:18 +0530

Hi Milind,

Thank you for your response.
Please find the logs attached, as instructed.

Thanks and Regards,
Kushagra Gupta

On Thu, Oct 5, 2023 at 12:09 PM Milind Changire <mchangir@xxxxxxxxxx> wrote:

> this is really odd
>
> Please run following commands and send over their outputs:
> # ceph status
> # ceph fs status
> # ceph report
> # ls -ld /<mount-path>/volumes/subvolgrp/test
> # ls -l /<mount-path>/volumes/subvolgrp/test/.snap
>
> On Thu, Oct 5, 2023 at 11:17 AM Kushagr Gupta
> <kushagrguptasps.mun@xxxxxxxxx> wrote:
> >
> > Hi Milind,Team
> >
> > Thank you for your response @Milind Changire
> >
> > >>The only thing I can think of is a stale mgr that wasn't restarted
> > >>after an upgrade.
> > >>Was an upgrade performed lately ?
> >
> > Yes an upgrade was performed after which we faced this. But we were
> facing this issue previously as well.
> > Another interesting thing which we observed was that even after the
> upgrade, the schedules that we created  before upgrade were still running.
> >
> > But to eliminate this, I installed a fresh cluster after purging the old
> one.
> > Commands used for as follows:
> > ```
> > ansible-playbook -i hosts infrastructure-playbooks/purge-cluster.yml
> > ansible-playbook -i hosts site.yml
> > ```
> >
> > After this kindly note the commands which we followed:
> > ```
> > [root@storagenode-1 ~]# ceph mgr module  enable snap_schedule
> > [root@storagenode-1 ~]# ceph config set mgr mgr/snap_schedule/log_level
> debug
> > [root@storagenode-1 ~]# sudo ceph fs subvolumegroup create cephfs
> subvolgrp
> > [root@storagenode-1 ~]# ceph fs subvolume create cephfs test subvolgrp
> > [root@storagenode-1 ~]# date
> > Thu Oct  5 04:23:09 UTC 2023
> > [root@storagenode-1 ~]# ceph fs snap-schedule add
> /volumes/subvolgrp/test 1h 2023-10-05T04:30:00
> > Schedule set for path /volumes/subvolgrp/test
> > [root@storagenode-1 ~]#  ceph fs snap-schedule list / --recursive=true
> > /volumes/subvolgrp/test 1h
> > [root@storagenode-1 ~]# ceph fs snap-schedule status
> /volumes/subvolgrp/test
> > {"fs": "cephfs", "subvol": null, "path": "/volumes/subvolgrp/test",
> "rel_path": "/volumes/subvolgrp/test", "schedule": "1h", "retention": {},
> "start": "2023-10-05T04:30:00", "created": "2023-10-05T04:23:39", "first":
> null, "last": null, "last_pruned": null, "created_count": 0,
> "pruned_count": 0, "active": true}
> > [root@storagenode-1 ~]# ceph fs subvolume info cephfs test subvolgrp
> > {
> >     "atime": "2023-10-05 04:20:18",
> >     "bytes_pcent": "undefined",
> >     "bytes_quota": "infinite",
> >     "bytes_used": 0,
> >     "created_at": "2023-10-05 04:20:18",
> >     "ctime": "2023-10-05 04:20:18",
> >     "data_pool": "cephfs_data",
> >     "features": [
> >         "snapshot-clone",
> >         "snapshot-autoprotect",
> >         "snapshot-retention"
> >     ],
> >     "gid": 0,
> >     "mode": 16877,
> >     "mon_addrs": [
> >         "[abcd:abcd:abcd::34]:6789",
> >         "[abcd:abcd:abcd::35]:6789",
> >         "[abcd:abcd:abcd::36]:6789"
> >     ],
> >     "mtime": "2023-10-05 04:20:18",
> >     "path":
> "/volumes/subvolgrp/test/73d82b1a-6fb1-4160-a388-66b898967a85",
> >     "pool_namespace": "",
> >     "state": "complete",
> >     "type": "subvolume",
> >     "uid": 0
> > }
> > [root@storagenode-1 ~]#
> > [root@storagenode-1 ~]# ceph fs snap-schedule status
> /volumes/subvolgrp/test
> > {"fs": "cephfs", "subvol": null, "path": "/volumes/subvolgrp/test",
> "rel_path": "/volumes/subvolgrp/test", "schedule": "1h", "retention": {"h":
> 4}, "start": "2023-10-05T04:30:00", "created": "2023-10-05T04:23:39",
> "first": null, "last": null, "last_pruned": null, "created_count": 0,
> "pruned_count": 0, "active": true}
> > [root@storagenode-1 ~]# date
> > Thu Oct  5 05:31:20 UTC 2023
> > [root@storagenode-1 ~]#
> > ```
> >
> > Could you please help us. Are we doing something wrong? Because still
> the schedules are not getting created.
> >
> > Thanks and Regards,
> > Kushagra Gupta
> >
> > On Wed, Oct 4, 2023 at 9:33 PM Milind Changire <mchangir@xxxxxxxxxx>
> wrote:
> >>
> >> On Wed, Oct 4, 2023 at 7:19 PM Kushagr Gupta
> >> <kushagrguptasps.mun@xxxxxxxxx> wrote:
> >> >
> >> > Hi Milind,
> >> >
> >> > Thank you for your swift response.
> >> >
> >> > >>How many hours did you wait after the "start time" and decide to
> restart mgr ?
> >> > We waited for ~3 days before restarting the mgr-service.
> >>
> >> The only thing I can think of is a stale mgr that wasn't restarted
> >> after an upgrade.
> >> Was an upgrade performed lately ?
> >>
> >> Did the dir exist at the time the snapshot was scheduled to take place.
> >> If it didn't then the schedule gets disabled until explicitly enabled.
> >>
> >> >
> >> > There was one more instance where we waited for 2 hours and then
> re-started and in the third hour the schedule started working.
> >> >
> >> > Could you please guide us if we are doing anything wrong.
> >> > Kindly let us know if any logs are required.
> >> >
> >> > Thanks and Regards,
> >> > Kushagra Gupta
> >> >
> >> > On Wed, Oct 4, 2023 at 5:39 PM Milind Changire <mchangir@xxxxxxxxxx>
> wrote:
> >> >>
> >> >> On Wed, Oct 4, 2023 at 3:40 PM Kushagr Gupta
> >> >> <kushagrguptasps.mun@xxxxxxxxx> wrote:
> >> >> >
> >> >> > Hi Team,Milind
> >> >> >
> >> >> > Ceph-version: Quincy, Reef
> >> >> > OS: Almalinux 8
> >> >> >
> >> >> > Issue: snap_schedule works after 1 hour of schedule
> >> >> >
> >> >> > Description:
> >> >> >
> >> >> > We are currently working in a 3-node ceph cluster.
> >> >> > We are currently exploring the scheduled snapshot capability of
> the ceph-mgr module.
> >> >> > To enable/configure scheduled snapshots, we followed the following
> link:
> >> >> >
> >> >> >
> >> >> >
> >> >> > https://docs.ceph.com/en/quincy/cephfs/snap-schedule/
> >> >> >
> >> >> >
> >> >> >
> >> >> > We were able to create snap schedules for the subvolumes as
> suggested.
> >> >> > But we have observed a two very strange behaviour:
> >> >> > 1. The snap_schedules only work when we restart the ceph-mgr
> service on the mgr node:
> >> >> > We then restarted the mgr-service on the active mgr node, and
> after 1 hour it started getting created. I am attaching the log file for
> the same after restart. Thre behaviour looks abnormal.
> >> >>
> >> >> A mgr restart is not required for the schedule to get triggered.
> >> >> How many hours did you wait after the "start time" and decide to
> restart mgr ?
> >> >>
> >> >> >
> >> >> > So,  for eg consider the below output:
> >> >> > ```
> >> >> > [root@storagenode-1 ~]# ceph fs snap-schedule status
> /volumes/subvolgrp/test3
> >> >> > {"fs": "cephfs", "subvol": null, "path":
> "/volumes/subvolgrp/test3", "rel_path": "/volumes/subvolgrp/test3",
> "schedule": "1h", "retention": {}, "start": "2023-10-04T07:20:00",
> "created": "2023-10-04T07:18:41", "first": "2023-10-04T08:20:00", "last":
> "2023-10-04T09:20:00", "last_pruned": null, "created_count": 2,
> "pruned_count": 0, "active": true}
> >> >> > [root@storagenode-1 ~]#
> >> >> > ```
> >> >> > As we can see in the above o/p, we created the schedule at
> 2023-10-04T07:18:41. The schedule was suppose to start at
> 2023-10-04T07:20:00 but it started at 2023-10-04T08:20:00
> >> >>
> >> >> seems normal behavior to me
> >> >> the schedule starts countdown for 1h from 2023-10-04T07:20:00 and
> >> >> created first snapshot at 2023-10-04T08:20:00
> >> >>
> >> >> >
> >> >> > Any input w.r.t the same will be of great help.
> >> >> >
> >> >> > Thanks and Regards
> >> >> > Kushagra Gupta
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Milind
> >> >>
> >>
> >>
> >> --
> >> Milind
> >>
>
>
> --
> Milind
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx