Re: Remove RBD mirror?

Jason Dillaman <jdillama@xxxxxxxxxx> · Tue, 9 Apr 2019 11:48:17 -0400

Any chance your rbd-mirror daemon has the admin sockets available
(defaults to /var/run/ceph/cephdr-client.<id>.<pid>.<random>.asok)? If
so, you can run "ceph --admin-daemon /path/to/asok rbd mirror status".

On Tue, Apr 9, 2019 at 11:26 AM Magnus Grönlund <magnus@xxxxxxxxxxx> wrote:
>
>
>
> Den tis 9 apr. 2019 kl 17:14 skrev Jason Dillaman <jdillama@xxxxxxxxxx>:
>>
>> On Tue, Apr 9, 2019 at 11:08 AM Magnus Grönlund <magnus@xxxxxxxxxxx> wrote:
>> >
>> > >On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund <magnus@xxxxxxxxxxx> wrote:
>> > >>
>> > >> Hi,
>> > >> We have configured one-way replication of pools between a production cluster and a backup cluster. But unfortunately the rbd-mirror or the backup cluster is unable to keep up with the production cluster so the replication fails to reach replaying state.
>> > >
>> > >Hmm, it's odd that they don't at least reach the replaying state. Are
>> > >they still performing the initial sync?
>> >
>> > There are three pools we try to mirror, (glance, cinder, and nova, no points for guessing what the cluster is used for :) ),
>> > the glance and cinder pools are smaller and sees limited write activity, and the mirroring works, the nova pool which is the largest and has 90% of the write activity never leaves the "unknown" state.
>> >
>> > # rbd mirror pool status cinder
>> > health: OK
>> > images: 892 total
>> >     890 replaying
>> >     2 stopped
>> > #
>> > # rbd mirror pool status nova
>> > health: WARNING
>> > images: 2479 total
>> >     2479 unknown
>> > #
>> > The production clsuter has 5k writes/s on average and the backup cluster has 1-2k writes/s on average. The production cluster is bigger and has better specs. I thought that the backup cluster would be able to keep up but it looks like I was wrong.
>>
>> The fact that they are in the unknown state just means that the remote
>> "rbd-mirror" daemon hasn't started any journal replayers against the
>> images. If it couldn't keep up, it would still report a status of
>> "up+replaying". What Ceph release are you running on your backup
>> cluster?
>>
> The backup cluster is running Luminous 12.2.11 (the production cluster 12.2.10)
>
>>
>> > >> And the journals on the rbd volumes keep growing...
>> > >>
>> > >> Is it enough to simply disable the mirroring of the pool  (rbd mirror pool disable <pool>) and that will remove the lagging reader from the journals and shrink them, or is there anything else that has to be done?
>> > >
>> > >You can either disable the journaling feature on the image(s) since
>> > >there is no point to leave it on if you aren't using mirroring, or run
>> > >"rbd mirror pool disable <pool>" to purge the journals.
>> >
>> > Thanks for the confirmation.
>> > I will stop the mirror of the nova pool and try to figure out if there is anything we can do to get the backup cluster to keep up.
>> >
>> > >> Best regards
>> > >> /Magnus
>> > >> _______________________________________________
>> > >> ceph-users mailing list
>> > >> ceph-users@xxxxxxxxxxxxxx
>> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > >
>> > >--
>> > >Jason
>>
>>
>>
>> --
>> Jason

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com