Hello Mykola/Eugen, Here's the output. We also restarted the rbd-mirror process # rbd journal info -p cifs --image research_data rbd journal '11cb6c2ae8944a': header_oid: journal.11cb6c2ae8944a object_oid_prefix: journal_data.17.11cb6c2ae8944a. order: 24 (16MiB objects) splay_width: 4 We restarted the rbd-mirror process on the DR side # rbd --cluster cephdr mirror pool status cifs --verbose health: OK images: 1 total 1 replaying research_data: global_id: 69656449-61b8-446e-8b1e-6cf9bd57d94a state: up+replaying description: replaying, master_position=[object_number=396351, tag_tid=4, entry_tid=455084955], mirror_position=[object_number=396351, tag_tid=4, entry_tid=455084955], entries_behind_master=0 last_update: 2021-02-19 15:36:30 Thanks, -Vikas -----Original Message----- From: Vikas Rana <vrana@xxxxxxxxxxxx> Sent: Friday, February 19, 2021 2:00 PM To: 'Mykola Golub' <to.my.trociny@xxxxxxxxx>; 'Eugen Block' <eblock@xxxxxx> Cc: ceph-users@xxxxxxx Subject: Re: Data Missing with RBD-Mirror Hello Mykola and Eugen, There was no interruption and we are in a campus with 10G backbone. We are on 12.2.10 I believe. We wanted to check the data on DR side and then we created a snapshot on primary which was available on DR side very quickly. It kind of gave me feeling that rbd-mirror is not stuck. I will run those commands and also restart the rbd-mirror and will report back. Thanks, -Vikas -----Original Message----- From: Mykola Golub <to.my.trociny@xxxxxxxxx> Sent: Thursday, February 18, 2021 2:51 PM To: Vikas Rana <vrana@xxxxxxxxxxxx>; Eugen Block <eblock@xxxxxx> Cc: ceph-users@xxxxxxx Subject: Re: Re: Data Missing with RBD-Mirror On Thu, Feb 18, 2021 at 03:28:11PM +0000, Eugen Block wrote: > Hi, > > was there an interruption between those sites? > > > last_update: 2021-01-29 15:10:13 > > If there was an interruption you'll probably need to resync those images. If your results shown below are not from that past then yes, it looks like the rbd-mirror (at least the image replayer) got stuck for some reason long time ago. Then I can't see though how could you mount a newly created snap, because it would not be replayed. Probably you had a snapshot with such name previously, it was replayed, then the rbd-mirror got stuck, the snapshot was deleted on the primary and a new one created recently. And on the secondary you was still seeing and mounting the old snapshot? This would also explain why you were able to mount it -- if data is really missing I expect you are not able to mount the fs due to corruption. If the rbd-mirror just got stuck then you probably don't need to resync. Just restarting the rbd-mirror should make it to start replaying again. Though taking how long it was not replaying, if the journal is very large, the resync might be faster. You can try: rbd journal info -p cifs --image research_data to see how large the journal is currently (the difference in the master and the rbd-mirror client positions). And if this is really the case that rbd-mirror got stuck, any additional info you could provide (rbd-mirror logs, the core dump) might be helpful for fixing the bug. It is can be reported right to the tracker. What version are you running BTW? -- Mykola Golub > Zitat von Vikas Rana <vrana@xxxxxxxxxxxx>: > > > Hi Friends, > > > > > > > > We have a very weird issue with rbd-mirror replication. As per the command > > output, we are in sync but the OSD usage on DR side doesn't match > > the Prod > > Side. > > > > On Prod, we are using close to 52TB but on DR side we are only 22TB. > > > > We took a snap on Prod and mounted the snap on DR side and compared > > the data > > and we found lot of missing data. Please see the output below. > > > > > > > > Please help us resolve this issue or point us in right direction. > > > > > > > > Thanks, > > > > -Vikas > > > > > > > > DR# rbd --cluster cephdr mirror pool status cifs --verbose > > > > health: OK > > > > images: 1 total > > > > 1 replaying > > > > > > > > research_data: > > > > global_id: 69656449-61b8-446e-8b1e-6cf9bd57d94a > > > > state: up+replaying > > > > description: replaying, master_position=[object_number=390133, tag_tid=4, > > entry_tid=447832541], mirror_position=[object_number=390133, > > tag_tid=4, entry_tid=447832541], entries_behind_master=0 > > > > last_update: 2021-01-29 15:10:13 > > > > > > > > DR# ceph osd pool ls detail > > > > pool 5 'cifs' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins > > pg_num 128 pgp_num 128 last_change 1294 flags hashpspool > > stripe_width 0 application rbd > > > > removed_snaps [1~5] > > > > > > > > > > > > PROD# ceph df detail > > > > POOLS: > > > > NAME ID QUOTA OBJECTS QUOTA BYTES USED %USED > > MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED > > > > cifs 17 N/A N/A 26.0TiB 30.10 > > 60.4TiB 6860550 6.86M 873MiB 509MiB 52.1TiB > > > > > > > > DR# ceph df detail > > > > POOLS: > > > > NAME ID QUOTA OBJECTS QUOTA BYTES USED %USED > > MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED > > > > cifs 5 N/A N/A 11.4TiB 15.78 > > 60.9TiB 3043260 3.04M 2.65MiB 431MiB 22.8TiB > > > > > > > > > > > > > > > > PROD#:/vol/research_data# du -sh * > > > > 11T Flab1 > > > > 346G KLab > > > > 1.5T More > > > > 4.4T ReLabs > > > > 4.0T WLab > > > > > > > > DR#:/vol/research_data# du -sh * > > > > 2.6T Flab1 > > > > 14G KLab > > > > 52K More > > > > 8.0K RLabs > > > > 202M WLab > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > > email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx