Re: Data Missing with RBD-Mirror

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 18, 2021 at 03:28:11PM +0000, Eugen Block wrote:
> Hi,
> 
> was there an interruption between those sites?
> 
> >   last_update: 2021-01-29 15:10:13
> 
> If there was an interruption you'll probably need to resync those images.

If your results shown below are not from that past then yes, it looks
like the rbd-mirror (at least the image replayer) got stuck for some
reason long time ago. Then I can't see though how could you mount a
newly created snap, because it would not be replayed.

Probably you had a snapshot with such name previously, it was
replayed, then the rbd-mirror got stuck, the snapshot was deleted on
the primary and a new one created recently. And on the secondary you
was still seeing and mounting the old snapshot?

This would also explain why you were able to mount it -- if data is
really missing I expect you are not able to mount the fs due to
corruption.

If the rbd-mirror just got stuck then you probably don't need to
resync. Just restarting the rbd-mirror should make it to start
replaying again. Though taking how long it was not replaying, if the
journal is very large, the resync might be faster.

You can try:

 rbd journal info -p cifs  --image research_data

to see how large the journal is currently (the difference in the
master and the rbd-mirror client positions).

And if this is really the case that rbd-mirror got stuck, any
additional info you could provide (rbd-mirror logs, the core dump)
might be helpful for fixing the bug. It is can be reported right to
the tracker.

What version are you running BTW?

-- 
Mykola Golub


> Zitat von Vikas Rana <vrana@xxxxxxxxxxxx>:
> 
> > Hi Friends,
> > 
> > 
> > 
> > We have a very weird issue with rbd-mirror replication. As per the command
> > output, we are in sync but the OSD usage on DR side doesn't match the Prod
> > Side.
> > 
> > On Prod, we are using close to 52TB but on DR side we are only 22TB.
> > 
> > We took a snap on Prod and mounted the snap on DR side and compared the data
> > and we found lot of missing data. Please see the output below.
> > 
> > 
> > 
> > Please help us resolve this issue or point us in right direction.
> > 
> > 
> > 
> > Thanks,
> > 
> > -Vikas
> > 
> > 
> > 
> > DR# rbd --cluster cephdr mirror pool status cifs --verbose
> > 
> > health: OK
> > 
> > images: 1 total
> > 
> >     1 replaying
> > 
> > 
> > 
> > research_data:
> > 
> >   global_id:   69656449-61b8-446e-8b1e-6cf9bd57d94a
> > 
> >   state:       up+replaying
> > 
> >   description: replaying, master_position=[object_number=390133, tag_tid=4,
> > entry_tid=447832541], mirror_position=[object_number=390133, tag_tid=4,
> > entry_tid=447832541], entries_behind_master=0
> > 
> >   last_update: 2021-01-29 15:10:13
> > 
> > 
> > 
> > DR# ceph osd pool ls detail
> > 
> > pool 5 'cifs' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins
> > pg_num 128 pgp_num 128 last_change 1294 flags hashpspool stripe_width 0
> > application rbd
> > 
> >         removed_snaps [1~5]
> > 
> > 
> > 
> > 
> > 
> > PROD# ceph df detail
> > 
> > POOLS:
> > 
> >     NAME        ID     QUOTA OBJECTS     QUOTA BYTES     USED        %USED
> > MAX AVAIL     OBJECTS     DIRTY     READ        WRITE       RAW USED
> > 
> >     cifs        17     N/A               N/A             26.0TiB     30.10
> > 60.4TiB     6860550     6.86M      873MiB      509MiB      52.1TiB
> > 
> > 
> > 
> > DR# ceph df detail
> > 
> > POOLS:
> > 
> >     NAME        ID     QUOTA OBJECTS     QUOTA BYTES     USED        %USED
> > MAX AVAIL     OBJECTS     DIRTY     READ        WRITE       RAW USED
> > 
> >     cifs        5      N/A               N/A             11.4TiB     15.78
> > 60.9TiB     3043260     3.04M     2.65MiB      431MiB      22.8TiB
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > PROD#:/vol/research_data# du -sh *
> > 
> > 11T     Flab1
> > 
> > 346G    KLab
> > 
> > 1.5T    More
> > 
> > 4.4T    ReLabs
> > 
> > 4.0T    WLab
> > 
> > 
> > 
> > DR#:/vol/research_data# du -sh *
> > 
> > 2.6T    Flab1
> > 
> > 14G     KLab
> > 
> > 52K     More
> > 
> > 8.0K    RLabs
> > 
> > 202M    WLab
> > 
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux