Thanks for your reply. The version we are running is Luminous 12.2.5, and we are actually using BlueStore with replicated pools. Our config is below: -> # cat /etc/ceph/ceph.conf [global] fsid = 96c5f802-ca66-4d12-974f-5b5658a18353 mon_initial_members = ceph00 mon_host = 10.18.192.27 auth_cluster_required = none auth_service_required = none auth_client_required = none public_network = 10.18.192.0/24 [mon] mon_allow_pool_delete = true And the experiment we did is like this: First create some OSDs and devide them into two different root. -> # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -7 0.05699 root root1 -5 0.05699 host ceph01 3 hdd 0.01900 osd.3 up 1.00000 1.00000 4 hdd 0.01900 osd.4 up 1.00000 1.00000 5 hdd 0.01900 osd.5 up 1.00000 1.00000 -1 0.05846 root default -3 0.05846 host ceph02 0 hdd 0.01949 osd.0 up 1.00000 1.00000 1 hdd 0.01949 osd.1 up 1.00000 1.00000 2 hdd 0.01949 osd.2 up 1.00000 1.00000 Then create a replicated pool on root default. Note we set the failure domain to OSD. -> # ceph osd pool create test 128 128 Next we put an object into the pool. -> # cat txt 123 -> # rados -p test put test_copy txt -> # rados -p test get test_copy - 123 Then we make OSD.0 down, and change its data of object test_copy. -> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 test_copy get-bytes 123 -> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 test_copy set-bytes 120txt Next we start OSD.0 and do data migration. -> # ceph osd pool set test crush_rule root1_rule Finally we try to get the object by rados and ceph-objectstore-tool -> # rados -p test get test_copy - error getting test/test_copy: (5) Input/output error -> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-3 test_copy get-bytes 120 -> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-4 test_copy get-bytes 120 -> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5 test_copy get-bytes 120 The data of test_copy on OSD.3 OSD.4 OSD.5 is from OSD.0 which has the silent data corruption. Regards, Poi Gregory Farnum <gfarnum@xxxxxxxxxx> 于2018年8月31日周五 上午12:51写道: > > On Thu, Aug 23, 2018 at 8:38 AM, poi <poiiiicen@xxxxxxxxx> wrote: > > Hello! > > > > Recently, we did data migration from one crush root to another, but > > after that, we found some objects were wrong and their copies on other > > OSDs were also wrong. > > > > Finally, we found that for one pg, the data migration uses only one > > OSD's data to generate three new copies, and do not check the crc > > before migration like assuming the data is always correct (but > > actually nobody can promise it). We tried both filestore and > > bluestore, and the results were the same. Copying from one pg without > > crc check may lack reliability. > > Exactly what version are you running, and what backends? Are you > actually using BlueStore? > > This is certainly the general case with replicated pools on FileStore, > but it shouldn't happen with BlueStore or EC pools at all. We aren't > going to implement "voting" on FileStore-backed OSDs though as that > would vastly multiply the cost of backfilling. :( > -Greg > > > > > Is there any way to ensure the correctness of data when data > > migration? Although we can do deep scrub before migration, but the > > cost is too high. I think when peering, adding crc check for objects > > before copying may work. > > > > Regards > > > > Poi