Re: cephfs snap-mirror stalled

Venky Shankar <vshankar@xxxxxxxxxx> · Fri, 16 Dec 2022 14:11:47 +0530



On Fri, Dec 16, 2022 at 1:27 PM Holger Naundorf <naundorf@xxxxxxxxxxxxxx> wrote:
>
>
>
> On 15.12.22 14:06, Venky Shankar wrote:
> > Hi Holger,
> >
> > (sorry for the late reply)
> >
> > On Fri, Dec 9, 2022 at 6:22 PM Holger Naundorf <naundorf@xxxxxxxxxxxxxx> wrote:
> >>
> >> As an update:
> >> After the third restart now the mirror-daemon is running normal again -
> >> only change to the restarts before was that during the restart
> >> dbug_client was set to 20. (First restart was after ~48h of no data
> >> movemnent to the receiver side and no changes, 2nd was with debug_mirror
> >> set to 20 and the thisrd - maybe to quickly - after only ~12h of no data
> >> movement with debug_mirror + _client set to 20). Currently I reset the
> >> debug levels back down to reduce the load on the system disks.
> >
> > Do you see ESTALE related messages in mirror daemon logs (with
> > debug_client: 20)? We have run into a couple of instances where the
> > mirror daemon would be stuck on a directory entry. The workaround for
> > that is to find the directory path (where the daemon is stuck) and
> > from another client (mount), list the entries in that directory (or at
> > times restarting the daemon works, as in your case).
> >
> Sounds like our bug - I do have the ESTALE messages with debug 20:
>
> syslog.restart-mirror-client_debug_20.gz:Dec  7 10:05:30 ceph07
> bash[1740788]: debug 2022-12-07T09:05:30.299+0000 7f05907a7700 20
> client.29690499 got ESTALE on tid 2258671 from mds.0
> syslog.restart-mirror-client_debug_20.gz:Dec  7 10:05:30 ceph07
> bash[1740788]: debug 2022-12-07T09:05:30.299+0000 7f05907a7700 20
> client.29690499 got ESTALE on tid 2258671 from mds.0
> syslog.restart-mirror-client_debug_20.gz:Dec  7 10:05:30 ceph07
> bash[1740788]: debug 2022-12-07T09:05:30.299+0000 7f05907a7700 20
> client.29690499 got ESTALE on tid 2258671 from mds.0
> (...)
>
> And when the mirror got stuck again after some time it mysteriously (at
> that time) started to work again while I was poking around in the
> logfiles and the system - and doing an 'ls' in the stuck dir was
> definitly included in the poking.
>
>
> > This will be fixed in the next pacific release (tracker:
> > https://tracker.ceph.com/issues/55935).
> >
> Next means the .11 release or the current .10 we have not yet upgraded to?

16.2.11

>
> Thanks for the update.
>
> Regards,
> Holger Naundorf
>
>
> >>
> >> Regards,
> >> Holger Naundorf
> >>
> >>
> >> On 07.12.22 15:53, Holger Naundorf wrote:
> >>> On 06.12.22 14:17, Venky Shankar wrote:
> >>>> On Tue, Dec 6, 2022 at 6:34 PM Holger Naundorf
> >>>> <naundorf@xxxxxxxxxxxxxx> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 06.12.22 09:54, Venky Shankar wrote:
> >>>>>> Hi Holger,
> >>>>>>
> >>>>>> On Tue, Dec 6, 2022 at 1:42 PM Holger Naundorf
> >>>>>> <naundorf@xxxxxxxxxxxxxx> wrote:
> >>>>>>>
> >>>>>>> Hello,
> >>>>>>> we have set up a snap-mirror for a directory on one of our clusters -
> >>>>>>> running ceph version
> >>>>>>>
> >>>>>>> ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
> >>>>>>> (stable)
> >>>>>>>
> >>>>>>> to get mirrorred our other cluster - running ceph version
> >>>>>>>
> >>>>>>> ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific
> >>>>>>> (stable)
> >>>>>>>
> >>>>>>> The initial setup went ok, when the first snapshot was created data
> >>>>>>> started to flow at a decent (for our HW) rate of 100-200MB/s. As the
> >>>>>>> directory contains  ~200TB this was expected to take some time -
> >>>>>>> but now
> >>>>>>> the process has stalled completely after ~100TB were mirrored and ~7d
> >>>>>>> running.
> >>>>>>>
> >>>>>>> Up to now I do not have any hints why it has stopped - I do not see
> >>>>>>> any
> >>>>>>> error messages from the cephfs-mirror daemon. Can the small version
> >>>>>>> mismatch be a problem?
> >>>>>>>
> >>>>>>> Any hints where to look to find out what has got stuck are welcome.
> >>>>>>
> >>>>>> I'd look at the mirror daemon logs for any errors to start with. You
> >>>>>> might want to crank up the log level for debugging (debug
> >>>>>> cephfs_mirror=20).
> >>>>>>
> >>>>>
> >>>>> Even on max debug I do not see anything which looks like an error - but
> >>>>> as this is the first time I try to dig into any cephfs-mirror logs I
> >>>>> might not notice (as long as it is not red and flashing).
> >>>>>
> >>>>> The Log basically this type of sequence, repeating forever:
> >>>>>
> >>>>> (...)
> >>>>> cephfs::mirror::MirrorWatcher handle_notify
> >>>>> cephfs::mirror::Mirror update_fs_mirrors
> >>>>> cephfs::mirror::Mirror schedule_mirror_update_task: scheduling fs mirror
> >>>>> update (0x556fe3a7f130) after 2 seconds
> >>>>> cephfs::mirror::Watcher handle_notify: notify_id=751516198184655,
> >>>>> handle=93939050205568, notifier_id=25504530
> >>>>> cephfs::mirror::MirrorWatcher handle_notify
> >>>>> cephfs::mirror::PeerReplayer(19361031-928d-4366-99bd-50df70d3adf1) run:
> >>>>> trying to pick from 1 directories
> >>>>> cephfs::mirror::PeerReplayer(19361031-928d-4366-99bd-50df70d3adf1)
> >>>>> pick_directory
> >>>>> cephfs::mirror::Watcher handle_notify: notify_id=751516198184656,
> >>>>> handle=93939050205568, notifier_id=25504530
> >>>>> cephfs::mirror::MirrorWatcher handle_notify
> >>>>> cephfs::mirror::Mirror update_fs_mirrors
> >>>>> cephfs::mirror::Mirror schedule_mirror_update_task: scheduling fs mirror
> >>>>> update (0x556fe3a7fc70) after 2 seconds
> >>>>> cephfs::mirror::Watcher handle_notify: notify_id=751516198184657,
> >>>>> handle=93939050205568, notifier_id=25504530
> >>>>> cephfs::mirror::MirrorWatcher handle_notify
> >>>>> (...)
> >>>>
> >>>> Basically, the interesting bit is not captured since it probably
> >>>> happened sometime back. Could you please set the following:
> >>>>
> >>>> debug cephfs_mirror = 20
> >>>> debug client = 20
> >>>>
> >>>> and restart the mirror daemon? The daemon would start synchronizing
> >>>> again. When synchronizing stalls, please share the daemon logs. If the
> >>>> log is huge, you could upload them via ceph-post-file.
> >>>>
> >>> If I set debug_client to 20 'huge' is an understatement.
> >>>
> >>> I now have three huge logfiles - one pair with debug_mirror set to 20
> >>> capturing the restart and the point where the sync stalls again and one
> >>> with both mirror and client debug at 20 capturing the  restart - but as
> >>> this setting created ~10GB logs within 20min I reset the client logging
> >>> again to spare our small system disks - if these logs are needed I think
> >>> I will have to set up some remote logging facility.
> >>>
> >>> The observation I made from the scanning the logs:
> >>>
> >>> After the restart the mirror daemon spends some hours comparing the
> >>> incomplete transfers
> >>> (Lots of limes with
> >>>
> >>> do_synchronize: 18 entries in stack
> >>> do_synchronize: top of stack path=./...(FILENAME)...
> >>> do_synchronize: entry=./(FILENAME), data_sync=0, attr_sync=1
> >>>
> >>> then there is the point where
> >>> the number of items in the stack goes down:
> >>> cephfs::mirror::PeerReplayer () do_synchronize: 8 entries in stack
> >>>
> >>> The top of stack moves up in the diretory levesl:
> >>>
> >>> cephfs::mirror::PeerReplayer () do_synchronize: top of stack
> >>> path=./...FILENAME...
> >>>
> >>> but then it just stops, without any error message visible in the logfile
> >>>
> >>> and switches to the repeating sequence I posted already.
> >>>
> >>> Should I try to upload the logs - even gzipped they are quite huge:
> >>>    388M syslog.restart-mirror-client_debug_20.gz
> >>>     98M syslog.restart-mirror-no_client_debug.gz
> >>>     54M syslog.stalled-no_client_debug.gz
> >>>
> >>> (as our servers are in an isolated net I will have to see if the
> >>> 'ceph-post-file' method works from another system.
> >>>
> >>> Regards,
> >>> Holger Naundorf
> >>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Holger
> >>>>>>>
> >>>>>>> --
> >>>>>>> Dr. Holger Naundorf
> >>>>>>> Christian-Albrechts-Universität zu Kiel
> >>>>>>> Rechenzentrum / HPC / Server und Storage
> >>>>>>> Tel: +49 431 880-1990
> >>>>>>> Fax:  +49 431 880-1523
> >>>>>>> naundorf@xxxxxxxxxxxxxx
> >>>>>>> _______________________________________________
> >>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> Dr. Holger Naundorf
> >>>>> Christian-Albrechts-Universität zu Kiel
> >>>>> Rechenzentrum / HPC / Server und Storage
> >>>>> Tel: +49 431 880-1990
> >>>>> Fax:  +49 431 880-1523
> >>>>> naundorf@xxxxxxxxxxxxxx
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >> --
> >> Dr. Holger Naundorf
> >> Christian-Albrechts-Universität zu Kiel
> >> Rechenzentrum / HPC / Server und Storage
> >> Tel: +49 431 880-1990
> >> Fax:  +49 431 880-1523
> >> naundorf@xxxxxxxxxxxxxx
> >
> >
> >
>
> --
> Dr. Holger Naundorf
> Christian-Albrechts-Universität zu Kiel
> Rechenzentrum / HPC / Server und Storage
> Tel: +49 431 880-1990
> Fax:  +49 431 880-1523
> naundorf@xxxxxxxxxxxxxx


-- 
Cheers,
Venky

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx