Re: Silent data corruption may destroy all the object copies after data migration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We do the migration because the old failure domain is going to run out of storage resources. And we considered to do deep scrub before the migration, but the cost is too high. In my opinion, doing crc check when generating copies may work.
Ceph will do crc check when reading objects, but why it do not check when doing migration?

On 8/19/2018 22:47Marc Roos<M.Roos@xxxxxxxxxxxxxxxxx> wrote:

"one OSD's data to generate three copies on new failure domain" because
ceph assumes it is correct.

Get the pg's that are going to be moved and scrub them?

I think the problem is more why these objects are inconsistent before
you even do the migration


-----Original Message-----
From: poi [mailto:poiiiicen@xxxxxxxxx]
Sent: zondag 19 augustus 2018 16:41
To: ceph-users@xxxxxxxxxxxxxx
Subject: [ceph-users] Silent data corruption may destroy all the object
copies after data migration

Hello!

Recently, we did data migration from one failure domain to another by
changing the logical pool's crush_rule (ceph osd pool set <pool_name>
crush_rule <rule_name>). But after the migration, we found that some
objects were destroyed and their copies on different OSDs were also
destroyed.

After taking some experiments, we found that for one object, the data
migration uses only one OSD's data to generate three copies on new
failure domain, and do not check the crc before migration by default. We
tried both filestore and bluestore, and the result is the same. Once
silent data corruption appears on one OSD, and when doing data
migration, the data source of some objects happens to be it, the new
object after the migration will be wrong on all copies.

Is there any way to ensure the correctness of data when data migration?
We tried to set filestore_sloppy_crc up or do deep scrub, but the cost
is too high.

The experiment we did is something like below:

First create some OSDs and devide them into two different root.



-> # ceph osd tree

ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF

-7 0.05699 root root1

-5 0.05699 host ceph01

3 hdd 0.01900 osd.3 up 1.00000 1.00000

4 hdd 0.01900 osd.4 up 1.00000 1.00000

5 hdd 0.01900 osd.5 up 1.00000 1.00000

-1 0.05846 root default

-3 0.05846 host ceph02

0 hdd 0.01949 osd.0 up 1.00000 1.00000

1 hdd 0.01949 osd.1 up 1.00000 1.00000

2 hdd 0.01949 osd.2 up 1.00000 1.00000



Then create a replicated pool on root default. Note we set the failure
domain to OSD.



ceph osd pool create test 128 128



Next we put an object into the pool.



-> # cat txt

123

-> # rados -p test put test_copy txt

-> # rados -p test get test_copy -

123



Then we make OSD.0 down, and change its data of object test_copy.



-> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0
-> test_copy get-bytes

123

-> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0
-> test_copy set-bytes 120txt



Next we start OSD.0 and do data migration.



ceph osd pool set test crush_rule root1_rule



Finally we try to get the object by rados and ceph-objectstore-tool



-> # rados -p test get test_copy -

error getting test/test_copy: (5) Input/output error

-> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-3
-> test_copy get-bytes

120

-> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-4
-> test_copy get-bytes

120

-> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5
-> test_copy get-bytes

120



The data of test_copy on OSD.3 OSD.4 OSD.5 is from OSD.0 which has the
silent data corruption.

Our config is below:

-> # cat /etc/ceph/ceph.conf

[global]

fsid = 96c5f802-ca66-4d12-974f-5b5658a18353

mon_initial_members = ceph00

mon_host = 10.18.192.27

auth_cluster_required = none

auth_service_required = none

auth_client_required = none

public_network = 10.18.192.0/24

[mon]

mon_allow_pool_delete = true



The ceph version is below

-> # ceph -v

ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous
(stable)



Thanks.


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux