Re: Copying without crc check when peering may lack reliability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 30 Aug 2018, Gregory Farnum wrote:
> On Thu, Aug 23, 2018 at 8:38 AM, poi <poiiiicen@xxxxxxxxx> wrote:
> > Hello!
> >
> > Recently, we did data migration from one crush root to another, but
> > after that, we found some objects were wrong and their copies on other
> > OSDs were also wrong.
> >
> > Finally, we found that for one pg, the data migration uses only one
> > OSD's data to generate three new copies, and do not check the crc
> > before migration like assuming the data is always correct (but
> > actually nobody can promise it). We tried both filestore and
> > bluestore, and the results were the same. Copying from one pg without
> > crc check may lack reliability.
> 
> Exactly what version are you running, and what backends? Are you
> actually using BlueStore?
> 
> This is certainly the general case with replicated pools on FileStore,
> but it shouldn't happen with BlueStore or EC pools at all. We aren't
> going to implement "voting" on FileStore-backed OSDs though as that
> would vastly multiply the cost of backfilling. :(

If I'm understanding, assuming you're running bluestore, the checksums on 
each OSD are (internally) correct (so reads succeed), but the actual data 
stored on the different OSDs is different.  And then you trigger a 
rebalance or recovery.  Is that right?

This is something a deep scrub will normally pick up in due time, but in a 
recovery case, we assume the existing replicas are consistent and base any 
new replica off of the primary's copy.

The way to "fix" that might be to do a deep-scrub on an object before it 
is recovered.  I'm pretty sure we don't want to incur that kind of 
overhead (it would slow recovery way down).  And in the recovery case 
where you are down one or more replicas, you're always going to have to 
base the new replica off an existing replica.  In theory if you're 
making multiple replicas you could source them from different 
existing copies, but in practice that's inefficient and doesn't align 
with how the OSD implements (everything is push/pull from the 
primary).

sage



> -Greg
> 
> >
> > Is there any way to ensure the correctness of data when data
> > migration? Although we can do deep scrub before migration, but the
> > cost is too high. I think when peering, adding crc check for objects
> > before copying may work.
> >
> > Regards
> >
> > Poi
> 
> 



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux