Re: rbd-mirror stops replaying journal on primary cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Running a simple
`echo 1>a;sync;rm a;sync;fstrim --all`
Triggers the problem. No need to have the mount point mounted with discard.

On Thu, Dec 8, 2022 at 12:33 AM Josef Johansson <josef86@xxxxxxxxx> wrote:
>
> Hi,
>
> I've updated https://tracker.ceph.com/issues/57396 with some more
> info, it seems that disabling discard within a guest solves the
> problem (or switching from virtio-scsi-single to virtio-blk in older
> kernels). I'm testing two different VMs on the same hypervisor with
> identical configs, one works the other doesn't.
>
> Not sure what to make of it, seems that the kernel around 4.18+ are
> sending a weird discard?
>
> On Tue, Aug 30, 2022 at 8:43 AM Josef Johansson <josef86@xxxxxxxxx> wrote:
> >
> > Hi,
> >
> > There's nothing special in the cluster when it stops replaying. It
> > seems that a journal entry that the local replayer doesn't handle and
> > just stops. Since it's the local replayer that stops there's no logs
> > in rbd-mirror. The odd part is that rbd-mirror handles this totally
> > fine and is the one syncing correctly.
> >
> > What's worse is that this is reported as HEALTHY in status
> > information, even though when restarting that VM it will stall until
> > replaying is complete. The replay function inside rbd client seems to
> > be fine handling the journal, but only on start of the vm. I will try
> > to get a ticket open on tracker.ceph.com as soon as my account is
> > approved.
> >
> > I have tried to see what component is responsible for local replay but
> > I have not been successful yet.
> >
> > Thanks for answering :)
> >
> > On Mon, Aug 22, 2022 at 11:05 AM Eugen Block <eblock@xxxxxx> wrote:
> > >
> > > Hi,
> > >
> > > IIRC the rbd mirror journals will grow if the sync stops to work,
> > > which seems to be the case here. Does the primary cluster experience
> > > any high load when the replay stops? How is the connection between the
> > > two sites and is the link saturated? Does the rbd-mirror log reveal
> > > anything useful (maybe also in debug mode)?
> > >
> > > Regards,
> > > Eugen
> > >
> > > Zitat von Josef Johansson <josef@xxxxxxxxxxx>:
> > >
> > > > Hi,
> > > >
> > > > I'm running ceph octopus 15.2.16 and I'm trying out two way mirroring.
> > > >
> > > > Everything seems to running fine except sometimes when the replay
> > > > stops at the primary clusters.
> > > >
> > > > This means that VMs will not start properly until all journal
> > > > entries are replayed, but also that the journal grows by time.
> > > >
> > > > I am trying to find out why this occurs, and where to look for more
> > > > information.
> > > >
> > > > I am currently using rbd --pool <pool> --image <image> journal
> > > > status to see if the clients are in sync or not.
> > > >
> > > > Example output when things went sideways
> > > >
> > > > minimum_set: 0
> > > > active_set: 2
> > > > registered clients:
> > > > [id=, commit_position=[positions=[[object_number=0, tag_tid=1,
> > > > entry_tid=4592], [object_number=3, tag_tid=1, entry_tid=4591],
> > > > [object_number=2, tag_tid=1, entry_tid=4590], [object_number=1,
> > > > tag_tid=1, entry_tid=4589]]], state=connected]
> > > > [id=bdde9b90-df26-4e3d-84b3-66605dc45608,
> > > > commit_position=[positions=[[object_number=5, tag_tid=1,
> > > > entry_tid=19913], [object_number=4, tag_tid=1, entry_tid=19912],
> > > > [object_number=7, tag_tid=1, entry_tid=19911], [object_number=6,
> > > > tag_tid=1, entry_tid=19910]]], state=disconnected]
> > > >
> > > > Right now I'm trying to catch it red handed in the primary osd logs.
> > > > But I'm not even sure if that's the process that is replaying the
> > > > journal..
> > > >
> > > > Regards
> > > > Josef
> > > > _______________________________________________
> > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > >
> > >
> > >
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux