hi, On 08/24/2018 09:25 AM, poi wrote:
Hello! Recently, we did data migration from one crush root to another because the resouces of the origin root is going to run out. But after the migration, we found that some objects were destroyed and their copies on other OSDs were also destroyed. Finally, we found that for one pg, the data migration uses only one OSD's data to generate three new copies, and do not check the crc before migration like assuming the data is always correct (but actually nobody can promise it). We tried both filestore and bluestore, and the results are the same. Copying from one pg without crc check may lack reliability. Is there any way to ensure the correctness of data when data migration? Although we can do deep scrub before migration, but the cost is too high. I think when peering, adding crc check for objects before copying may work.
In current implementation, deep-scrub may also not find silent data error timely when object's data is cached for bluestore. Please see this pull request's commit message: https://github.com/ceph/ceph/pull/23629 When donging crc checks, we should bypass cache to read disk directly. Regards, Xiaoguang Wang
The experiment we did is something like below: First create some OSDs and devide them into two different root. -> # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -7 0.05699 root root1 -5 0.05699 host ceph01 3 hdd 0.01900 osd.3 up 1.00000 1.00000 4 hdd 0.01900 osd.4 up 1.00000 1.00000 5 hdd 0.01900 osd.5 up 1.00000 1.00000 -1 0.05846 root default -3 0.05846 host ceph02 0 hdd 0.01949 osd.0 up 1.00000 1.00000 1 hdd 0.01949 osd.1 up 1.00000 1.00000 2 hdd 0.01949 osd.2 up 1.00000 1.00000 Then create a replicated pool on root default. Note we set the failure domain to OSD. ceph osd pool create test 128 128 Next we put an object into the pool. -> # cat txt 123 -> # rados -p test put test_copy txt -> # rados -p test get test_copy - 123 Then we make OSD.0 down, and change its data of object test_copy. -> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 test_copy get-bytes 123 -> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 test_copy set-bytes 120txt Next we start OSD.0 and do data migration. ceph osd pool set test crush_rule root1_rule Finally we try to get the object by rados and ceph-objectstore-tool -> # rados -p test get test_copy - error getting test/test_copy: (5) Input/output error -> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-3 test_copy get-bytes 120 -> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-4 test_copy get-bytes 120 -> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5 test_copy get-bytes 120 The data of test_copy on OSD.3 OSD.4 OSD.5 is from OSD.0 which has the silent data corruption. Our config is below: -> # cat /etc/ceph/ceph.conf [global] fsid = 96c5f802-ca66-4d12-974f-5b5658a18353 mon_initial_members = ceph00 mon_host = 10.18.192.27 auth_cluster_required = none auth_service_required = none auth_client_required = none public_network = 10.18.192.0/24 [mon] mon_allow_pool_delete = true The ceph version is below -> # ceph -v ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable) Thanks Jiahui Cen