Re: cephfs snap-mirror stalled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 6, 2022 at 6:34 PM Holger Naundorf <naundorf@xxxxxxxxxxxxxx> wrote:
>
>
>
> On 06.12.22 09:54, Venky Shankar wrote:
> > Hi Holger,
> >
> > On Tue, Dec 6, 2022 at 1:42 PM Holger Naundorf <naundorf@xxxxxxxxxxxxxx> wrote:
> >>
> >> Hello,
> >> we have set up a snap-mirror for a directory on one of our clusters -
> >> running ceph version
> >>
> >> ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
> >> (stable)
> >>
> >> to get mirrorred our other cluster - running ceph version
> >>
> >> ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific
> >> (stable)
> >>
> >> The initial setup went ok, when the first snapshot was created data
> >> started to flow at a decent (for our HW) rate of 100-200MB/s. As the
> >> directory contains  ~200TB this was expected to take some time - but now
> >> the process has stalled completely after ~100TB were mirrored and ~7d
> >> running.
> >>
> >> Up to now I do not have any hints why it has stopped - I do not see any
> >> error messages from the cephfs-mirror daemon. Can the small version
> >> mismatch be a problem?
> >>
> >> Any hints where to look to find out what has got stuck are welcome.
> >
> > I'd look at the mirror daemon logs for any errors to start with. You
> > might want to crank up the log level for debugging (debug
> > cephfs_mirror=20).
> >
>
> Even on max debug I do not see anything which looks like an error - but
> as this is the first time I try to dig into any cephfs-mirror logs I
> might not notice (as long as it is not red and flashing).
>
> The Log basically this type of sequence, repeating forever:
>
> (...)
> cephfs::mirror::MirrorWatcher handle_notify
> cephfs::mirror::Mirror update_fs_mirrors
> cephfs::mirror::Mirror schedule_mirror_update_task: scheduling fs mirror
> update (0x556fe3a7f130) after 2 seconds
> cephfs::mirror::Watcher handle_notify: notify_id=751516198184655,
> handle=93939050205568, notifier_id=25504530
> cephfs::mirror::MirrorWatcher handle_notify
> cephfs::mirror::PeerReplayer(19361031-928d-4366-99bd-50df70d3adf1) run:
> trying to pick from 1 directories
> cephfs::mirror::PeerReplayer(19361031-928d-4366-99bd-50df70d3adf1)
> pick_directory
> cephfs::mirror::Watcher handle_notify: notify_id=751516198184656,
> handle=93939050205568, notifier_id=25504530
> cephfs::mirror::MirrorWatcher handle_notify
> cephfs::mirror::Mirror update_fs_mirrors
> cephfs::mirror::Mirror schedule_mirror_update_task: scheduling fs mirror
> update (0x556fe3a7fc70) after 2 seconds
> cephfs::mirror::Watcher handle_notify: notify_id=751516198184657,
> handle=93939050205568, notifier_id=25504530
> cephfs::mirror::MirrorWatcher handle_notify
> (...)

Basically, the interesting bit is not captured since it probably
happened sometime back. Could you please set the following:

debug cephfs_mirror = 20
debug client = 20

and restart the mirror daemon? The daemon would start synchronizing
again. When synchronizing stalls, please share the daemon logs. If the
log is huge, you could upload them via ceph-post-file.

>
>
>
> >>
> >> Regards,
> >> Holger
> >>
> >> --
> >> Dr. Holger Naundorf
> >> Christian-Albrechts-Universität zu Kiel
> >> Rechenzentrum / HPC / Server und Storage
> >> Tel: +49 431 880-1990
> >> Fax:  +49 431 880-1523
> >> naundorf@xxxxxxxxxxxxxx
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> >
> >
>
> --
> Dr. Holger Naundorf
> Christian-Albrechts-Universität zu Kiel
> Rechenzentrum / HPC / Server und Storage
> Tel: +49 431 880-1990
> Fax:  +49 431 880-1523
> naundorf@xxxxxxxxxxxxxx



-- 
Cheers,
Venky

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux