On Tue, Dec 6, 2022 at 6:34 PM Holger Naundorf <naundorf@xxxxxxxxxxxxxx> wrote: > > > > On 06.12.22 09:54, Venky Shankar wrote: > > Hi Holger, > > > > On Tue, Dec 6, 2022 at 1:42 PM Holger Naundorf <naundorf@xxxxxxxxxxxxxx> wrote: > >> > >> Hello, > >> we have set up a snap-mirror for a directory on one of our clusters - > >> running ceph version > >> > >> ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific > >> (stable) > >> > >> to get mirrorred our other cluster - running ceph version > >> > >> ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific > >> (stable) > >> > >> The initial setup went ok, when the first snapshot was created data > >> started to flow at a decent (for our HW) rate of 100-200MB/s. As the > >> directory contains ~200TB this was expected to take some time - but now > >> the process has stalled completely after ~100TB were mirrored and ~7d > >> running. > >> > >> Up to now I do not have any hints why it has stopped - I do not see any > >> error messages from the cephfs-mirror daemon. Can the small version > >> mismatch be a problem? > >> > >> Any hints where to look to find out what has got stuck are welcome. > > > > I'd look at the mirror daemon logs for any errors to start with. You > > might want to crank up the log level for debugging (debug > > cephfs_mirror=20). > > > > Even on max debug I do not see anything which looks like an error - but > as this is the first time I try to dig into any cephfs-mirror logs I > might not notice (as long as it is not red and flashing). > > The Log basically this type of sequence, repeating forever: > > (...) > cephfs::mirror::MirrorWatcher handle_notify > cephfs::mirror::Mirror update_fs_mirrors > cephfs::mirror::Mirror schedule_mirror_update_task: scheduling fs mirror > update (0x556fe3a7f130) after 2 seconds > cephfs::mirror::Watcher handle_notify: notify_id=751516198184655, > handle=93939050205568, notifier_id=25504530 > cephfs::mirror::MirrorWatcher handle_notify > cephfs::mirror::PeerReplayer(19361031-928d-4366-99bd-50df70d3adf1) run: > trying to pick from 1 directories > cephfs::mirror::PeerReplayer(19361031-928d-4366-99bd-50df70d3adf1) > pick_directory > cephfs::mirror::Watcher handle_notify: notify_id=751516198184656, > handle=93939050205568, notifier_id=25504530 > cephfs::mirror::MirrorWatcher handle_notify > cephfs::mirror::Mirror update_fs_mirrors > cephfs::mirror::Mirror schedule_mirror_update_task: scheduling fs mirror > update (0x556fe3a7fc70) after 2 seconds > cephfs::mirror::Watcher handle_notify: notify_id=751516198184657, > handle=93939050205568, notifier_id=25504530 > cephfs::mirror::MirrorWatcher handle_notify > (...) Basically, the interesting bit is not captured since it probably happened sometime back. Could you please set the following: debug cephfs_mirror = 20 debug client = 20 and restart the mirror daemon? The daemon would start synchronizing again. When synchronizing stalls, please share the daemon logs. If the log is huge, you could upload them via ceph-post-file. > > > > >> > >> Regards, > >> Holger > >> > >> -- > >> Dr. Holger Naundorf > >> Christian-Albrechts-Universität zu Kiel > >> Rechenzentrum / HPC / Server und Storage > >> Tel: +49 431 880-1990 > >> Fax: +49 431 880-1523 > >> naundorf@xxxxxxxxxxxxxx > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > > > -- > Dr. Holger Naundorf > Christian-Albrechts-Universität zu Kiel > Rechenzentrum / HPC / Server und Storage > Tel: +49 431 880-1990 > Fax: +49 431 880-1523 > naundorf@xxxxxxxxxxxxxx -- Cheers, Venky _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx