Re: RBD Mirror Image Resync

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 22, 2019 at 8:38 AM Vikas Rana <vrana@xxxxxxxxxxxx> wrote:
>
> Hi Jason,
>
> Thanks you for your help and support.
>
>
> One last question, after the demotion and promotion and when you do a resync again, does it copies the whole image again or sends just the changes since the last journal update?

Right now, it will copy the entire image. There is still a long(er)
term plan to get support from the OSDs to deeply delete a backing
object which would be needed in the case where a snapshot exists on
the image and you need to resync the non-HEAD revision. Once that
support is in-place, we can tweak the resync logic to only copy the
deltas by comparing hashes of the objects.

> I'm trying to estimate how long will it take to get a 200TB image in sync.
>
> Thanks,
> -Vikas
>
>
> -----Original Message-----
> From: Jason Dillaman <jdillama@xxxxxxxxxx>
> Sent: Wednesday, March 13, 2019 4:49 PM
> To: Vikas Rana <vrana@xxxxxxxxxxxx>
> Subject: Re:  RBD Mirror Image Resync
>
> On Wed, Mar 13, 2019 at 4:42 PM Vikas Rana <vrana@xxxxxxxxxxxx> wrote:
> >
> > Thanks Jason for your response.
> >
> > From the documents, I believe the resync has to be run where rbd-mirror daemon is running.
> > Rbd-mirror is running on the DR site and that’s where we issued the resync.
>
> You would need rbd-mirror daemon configured and running against both clusters. The "resync" request just adds a flag to the specified image which the local "rbd-mirror" daemon discovers and then starts to pull the image down from the remote cluster. So again, the correct procedure is to initiate the resync against the out-of-sync image you want to delete/recreate, wait for it to complete, then demote the current primary image, and promote the newly resynced image to primary.
>
> > Should we do it on Prod site?
> > Here's the Prod status
> > :~# rbd info nfs/dir_research
> > rbd image 'dir_research':
> >         size 200 TB in 52428800 objects
> >         order 22 (4096 kB objects)
> >         block_name_prefix: rbd_data.edd65238e1f29
> >         format: 2
> >         features: layering, exclusive-lock, journaling
> >         flags:
> >         journal: edd65238e1f29
> >         mirroring state: enabled
> >         mirroring global id: 3ad67d0c-e06b-406a-9469-4e5faedd09a4
> >         mirroring primary: true
>
> Are you sure this is the prod site? The image id is different from the dump below.
>
> >
> >
> > What does "starting_replay" means?
>
> Given that the state is "down+unknown", I think it's just an odd, left-over status message. The "down" indicates that you do not have a running/functional "rbd-mirror" daemon running against cluster "cephdr". If it is running, I would check its log messages to see if any errors are being spit out.
>
> > Thanks,
> > -Vikas
> >
> > -----Original Message-----
> > From: Jason Dillaman <jdillama@xxxxxxxxxx>
> > Sent: Wednesday, March 13, 2019 3:44 PM
> > To: Vikas Rana <vrana@xxxxxxxxxxxx>
> > Cc: ceph-users <ceph-users@xxxxxxxxxxxxxx>
> > Subject: Re:  RBD Mirror Image Resync
> >
> > On Tue, Mar 12, 2019 at 11:09 PM Vikas Rana <vrana@xxxxxxxxxxxx> wrote:
> > >
> > > Hi there,
> > >
> > >
> > >
> > > We are replicating a RBD image from Primary to DR site using RBD mirroring.
> > >
> > > On Primary, we were using 10.2.10.
> >
> > Just a note that Jewel is end-of-life upstream.
> >
> > > DR site is luminous and we promoted the DR copy to test the failure. Everything checked out good.
> > >
> > >
> > >
> > > Now we are trying to restart the replication and we did the demote
> > > and then resync the image but it stuck in “starting_replay” state
> > > for last
> > > 3 days. It’s a 200TB RBD image
> >
> > You would need to run "rbd --cluster <primary-site> mirror image resync nfs/dir_research" and wait for that to complete *before* demoting the primary image on cluster "cephdr". Without a primary image, there is nothing to resync against.
> >
> > >
> > >
> > > :~# rbd --cluster cephdr mirror pool status nfs --verbose
> > >
> > > health: WARNING
> > >
> > > images: 1 total
> > >
> > >     1 starting_replay
> > >
> > >
> > >
> > > dir_research:
> > >
> > >   global_id:   3ad67d0c-e06b-406a-9469-4e5faedd09a4
> > >
> > >   state:       down+unknown
> > >
> > >   description: status not found
> > >
> > >   last_update:
> > >
> > >
> > >
> > >
> > >
> > > #rbd info nfs/dir_research
> > >
> > > rbd image 'dir_research':
> > >
> > >         size 200TiB in 52428800 objects
> > >
> > >         order 22 (4MiB objects)
> > >
> > >         block_name_prefix: rbd_data.652186b8b4567
> > >
> > >         format: 2
> > >
> > >         features: layering, exclusive-lock, journaling
> > >
> > >         flags:
> > >
> > >         create_timestamp: Thu Feb  7 11:53:36 2019
> > >
> > >         journal: 652186b8b4567
> > >
> > >         mirroring state: disabling
> > >
> > >         mirroring global id: 3ad67d0c-e06b-406a-9469-4e5faedd09a4
> > >
> > >         mirroring primary: false
> > >
> > >
> > > So the question is, how do we know the progress of the replay and how much its already completed and any ETA estimation on when it will go back to OK state?
> > >
> > >
> > >
> > >
> > > Thanks,
> > >
> > > -Vikas
> > >
> > >
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> > --
> > Jason
> >
>
>
> --
> Jason
>


-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux