Re: How to handle incomplete data after rbd import-diff failure?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Maged,

2024年5月2日(木) 5:34 Maged Mokhtar <mmokhtar@xxxxxxxxxxx>:

>
> On 01/05/2024 16:12, Satoru Takeuchi wrote:
> > I confirmed that incomplete data is left on `rbd import-diff` failure.
> > I guess that this data is the part of snapshot. Could someone answer
> > me the following questions?
> >
> > Q1. Is it safe to use the RBD image (e.g. client I/O and snapshot
> > management) even though incomplete data exists?
> > Q2. Is there any way to clean up the incomplete data?
> >
> > I read the following document and understand that this problem will be
> > resolved after running `rbd import-diff` again.
> >
> > https://ceph.io/en/news/blog/2013/incremental-snapshots-with-rbd/
> >> Since overwriting the same data is idempotent, it’s safe to have an
> import-diff interrupted in the middle.
> > However, it's difficult if I can't access the exported backup data
> > anymore. For instance, I'm afraid of the following scenario.
> >
> > 1. Send the backup data from one DC (DC0) to another DC (DC1)
> periodically.
> > 2. The backup data is created in DC0 and is sent directly to DC1
> > without persist backup data as a file.
> > 3. Major power outage happens in DC0 and it's impossible to
> > re-generate the backup data for  a long time.
> >
> > I simulated this problem as follows:
> >
> > 1. Create an RBD image.
> > 2. Write some data to this image.
> > 3. Create a snapshot S0.
> > 4. Write another data to this image.
> > 5. Create a snapshot S1.
> > 6. Create a backup data consists of the difference between S0 and S1
> > by running rbd export-diff.
> > 7. Delete the last byte of the backup data, which is 'e' and means the
> > end of the backup data, to inject import-diff failure.
> > 8. Delete S1.
> > 9. Run rbd import-diff to apply the broken backup data created in the
> step 7.
> >
> > Then step9 failed and S1 was not created. However, the number of RADOS
> > objects and the storage usage has increased.
> >
> > before
> > ```
> > $ rados -p replicapool df
> > POOL_NAME      USED  OBJECTS  CLONES  COPIES  MISSING_ON_PRIMARY
> > UNFOUND  DEGRADED  RD_OPS      RD  WR_OPS      WR  USED COMPR  UNDER
> > COMPR
> > replicapool  11 MiB       24       9      24                   0
> >   0         0    3609  53 MiB     279  41 MiB         0 B          0 B
> >
> > total_objects    24
> > total_used       39 MiB
> > total_avail      32 GiB
> > total_space      32 GiB
> > ```
> >
> > after:
> > ```
> > $ rados -p replicapool df
> > POOL_NAME      USED  OBJECTS  CLONES  COPIES  MISSING_ON_PRIMARY
> > UNFOUND  DEGRADED  RD_OPS      RD  WR_OPS      WR  USED COMPR  UNDER
> > COMPR
> > replicapool  12 MiB       25       9      25                   0
> >   0         0    3531  53 MiB     278  41 MiB         0 B          0 B
> >
> > total_objects    25
> > total_used       40 MiB
> > total_avail      32 GiB
> > total_space      32 GiB
> > ```
> >
> > The incomplete data seem to increase if rbd import-diff fails again
> > and again. The following output was get after the above-mentioned
> > step9 100 times.
> >
> > ```
> > $ rados -p replicapool df
> > POOL_NAME      USED  OBJECTS  CLONES  COPIES  MISSING_ON_PRIMARY
> > UNFOUND  DEGRADED  RD_OPS       RD  WR_OPS       WR  USED COMPR  UNDER
> > COMPR
> > replicapool  12 MiB       25       9      25                   0
> >   0         0    7925  104 MiB    1308  164 MiB         0 B          0
> > B
> >
> > total_objects    25
> > total_used       58 MiB
> > total_avail      32 GiB
> > total_space      32 GiB
> > ```
> >
> > Thanks,
> > Satoru
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> the image is not in a consistent state so should not be used as is. if
> you no longer have access to the source image or its exported data, you
> should be able to use the rbd snap rollback command to rollback the
> destination image to its last  known good snapshot, the destination
> snapshots get created from the import-diff command with names matching
> source snapshots.
>

Thank you for the reply. I succeed to rollback the rbd image to S0 and
`total_objects` got back to the previous value (24).

On the other hand, `total_used` didn't become the original value. Repeating
the following steps resulted in the continuous growth of `total_used`.

1. Import the broken diff (it fails).
2. Rollback to S0.

I guess it's a resource leak.

Could you tell me whether I can clean up these remaining garbage data?

Best,
Satoru
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux