On 2018-09-12 19:49:16-07:00 Jason Dillaman wrote:
Yes, that is where we are seeing the corruption. We have also noticed that different runs of export-diff seem to corrupt the data in different ways.
The filesystem was unmounted at the time of the export, our system is designed to only work on unmounted filesystems. > On Wed, Sep 12, 2018 at 8:32 PM <patrick.mclean@xxxxxxxx> wrote: > > > > Hi Jason, > > > > On 2018-09-10 11:15:45-07:00 ceph-users wrote: > > > > On 2018-09-10 11:04:20-07:00 Jason Dillaman wrote: > > > > > > > In addition to this, we are seeing a similar type of corruption in another use case when we migrate RBDs and snapshots across pools. In this case we clone a version of an RBD (e.g. HEAD-3) to a new pool and rely on 'rbd export-diff/import-diff' to restore the last 3 snapshots on top. Here too we see cases of fsck and RBD checksum failures. > > > We maintain various metrics and logs. Looking back at our data we have seen the issue at a small scale for a while on Jewel, but the frequency increased recently. The timing may have coincided with a move to Luminous, but this may be coincidence. We are currently on Ceph 12.2.5. > > > We are wondering if people are experiencing similar issues with 'rbd export-diff / import-diff'. I'm sure many people use it to keep backups in sync. Since it is backups, many people may not inspect the data often. In our use case, we use this mechanism to keep data in sync and actually need the data in the other location often. We are wondering if anyone else has encountered any issues, it's quite possible that many people may have this issue, buts simply don't realize. We are likely hitting it much more frequently due to the scale of our operation (tens of thousands of syncs a day). > > > > If you are able to recreate this reliably without tiering, it would > > assist in debugging if you could capture RBD debug logs during the > > export along w/ the LBA of the filesystem corruption to compare > > against. > > > > We haven't been able to reproduce this reliably as of yet, as of yet we haven't actually figured out the exact conditions that cause this to happen, we have just been seeing it happen on some percentage of export/import-diff operations. > > > > > > Logs from both export-diff and import-diff in a case where the result gets corrupted are attached. Please let me know if you need any more information. > > |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com