Re: Data Missing with RBD-Mirror

"Vikas Rana" <vrana@xxxxxxxxxxxx> · Fri, 19 Feb 2021 13:59:59 -0500

Hello Mykola and Eugen,
There was no interruption and we are in a campus with 10G backbone.
We are on 12.2.10 I believe.

We wanted to check the data on DR side and then we created a snapshot on
primary which was available on DR side very quickly. It kind of gave me
feeling that rbd-mirror is not stuck.

I will run those commands and also restart the rbd-mirror and will report
back.

Thanks,
-Vikas

-----Original Message-----
From: Mykola Golub <to.my.trociny@xxxxxxxxx> 
Sent: Thursday, February 18, 2021 2:51 PM
To: Vikas Rana <vrana@xxxxxxxxxxxx>; Eugen Block <eblock@xxxxxx>
Cc: ceph-users@xxxxxxx
Subject: Re:  Re: Data Missing with RBD-Mirror

On Thu, Feb 18, 2021 at 03:28:11PM +0000, Eugen Block wrote:
> Hi,
> 
> was there an interruption between those sites?
> 
> >   last_update: 2021-01-29 15:10:13
> 
> If there was an interruption you'll probably need to resync those images.

If your results shown below are not from that past then yes, it looks like
the rbd-mirror (at least the image replayer) got stuck for some reason long
time ago. Then I can't see though how could you mount a newly created snap,
because it would not be replayed.

Probably you had a snapshot with such name previously, it was replayed, then
the rbd-mirror got stuck, the snapshot was deleted on the primary and a new
one created recently. And on the secondary you was still seeing and mounting
the old snapshot?

This would also explain why you were able to mount it -- if data is really
missing I expect you are not able to mount the fs due to corruption.

If the rbd-mirror just got stuck then you probably don't need to resync.
Just restarting the rbd-mirror should make it to start replaying again.
Though taking how long it was not replaying, if the journal is very large,
the resync might be faster.

You can try:

 rbd journal info -p cifs  --image research_data

to see how large the journal is currently (the difference in the master and
the rbd-mirror client positions).

And if this is really the case that rbd-mirror got stuck, any additional
info you could provide (rbd-mirror logs, the core dump) might be helpful for
fixing the bug. It is can be reported right to the tracker.

What version are you running BTW?

--
Mykola Golub

> Zitat von Vikas Rana <vrana@xxxxxxxxxxxx>:
> 
> > Hi Friends,
> > 
> > 
> > 
> > We have a very weird issue with rbd-mirror replication. As per the
command
> > output, we are in sync but the OSD usage on DR side doesn't match the
Prod
> > Side.
> > 
> > On Prod, we are using close to 52TB but on DR side we are only 22TB.
> > 
> > We took a snap on Prod and mounted the snap on DR side and compared the
data
> > and we found lot of missing data. Please see the output below.
> > 
> > 
> > 
> > Please help us resolve this issue or point us in right direction.
> > 
> > 
> > 
> > Thanks,
> > 
> > -Vikas
> > 
> > 
> > 
> > DR# rbd --cluster cephdr mirror pool status cifs --verbose
> > 
> > health: OK
> > 
> > images: 1 total
> > 
> >     1 replaying
> > 
> > 
> > 
> > research_data:
> > 
> >   global_id:   69656449-61b8-446e-8b1e-6cf9bd57d94a
> > 
> >   state:       up+replaying
> > 
> >   description: replaying, master_position=[object_number=390133,
tag_tid=4,
> > entry_tid=447832541], mirror_position=[object_number=390133, tag_tid=4,
> > entry_tid=447832541], entries_behind_master=0
> > 
> >   last_update: 2021-01-29 15:10:13
> > 
> > 
> > 
> > DR# ceph osd pool ls detail
> > 
> > pool 5 'cifs' replicated size 2 min_size 1 crush_rule 0 object_hash
rjenkins
> > pg_num 128 pgp_num 128 last_change 1294 flags hashpspool stripe_width 0
> > application rbd
> > 
> >         removed_snaps [1~5]
> > 
> > 
> > 
> > 
> > 
> > PROD# ceph df detail
> > 
> > POOLS:
> > 
> >     NAME        ID     QUOTA OBJECTS     QUOTA BYTES     USED
%USED
> > MAX AVAIL     OBJECTS     DIRTY     READ        WRITE       RAW USED
> > 
> >     cifs        17     N/A               N/A             26.0TiB
30.10
> > 60.4TiB     6860550     6.86M      873MiB      509MiB      52.1TiB
> > 
> > 
> > 
> > DR# ceph df detail
> > 
> > POOLS:
> > 
> >     NAME        ID     QUOTA OBJECTS     QUOTA BYTES     USED
%USED
> > MAX AVAIL     OBJECTS     DIRTY     READ        WRITE       RAW USED
> > 
> >     cifs        5      N/A               N/A             11.4TiB
15.78
> > 60.9TiB     3043260     3.04M     2.65MiB      431MiB      22.8TiB
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > PROD#:/vol/research_data# du -sh *
> > 
> > 11T     Flab1
> > 
> > 346G    KLab
> > 
> > 1.5T    More
> > 
> > 4.4T    ReLabs
> > 
> > 4.0T    WLab
> > 
> > 
> > 
> > DR#:/vol/research_data# du -sh *
> > 
> > 2.6T    Flab1
> > 
> > 14G     KLab
> > 
> > 52K     More
> > 
> > 8.0K    RLabs
> > 
> > 202M    WLab
> > 
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx