Hello Ceph happy users. Starting this test I want to understand how Ceph can protect my data and what I have to do in some situations. So let's begin == Preparation ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299) Ceph contains MON: 3 OSD: 3 File system: ZFS Kernel: 4.2.6 Preparing pool # ceph osd pool create rbd 100 pool 'rbd' created # ceph osd pool set rbd size 3 set pool 16 size to 3 RBD client # rbd create test --size 4G # rbd map test /dev/rbd0 # mkfs.ext2 /dev/rbd0 # mount /dev/rbd0 /mnt # printf "aaaaaaaaaa\nbbbbbbbbbb" > /mnt/file Searching PG for our file # grep "aaaaaaaaa" * -R Binary file osd/nmz-0-journal/journal matches Binary file osd/nmz-1/current/16.22_head/rbd\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 matches Binary file osd/nmz-2/current/16.22_head/rbd\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 matches Binary file osd/nmz-1-journal/journal matches Binary file osd/nmz-0/current/16.22_head/rbd\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 matches Binary file osd/nmz-2-journal/journal matches PG info # ceph pg ls pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 16.22 1 0 0 0 0 8192 2 2 active+clean 2016-02-19 08:46:11.157938 242'2 242:14 [2,1,0] 2 [2,1,0] 2 0'0 2016-02-19 08:45:38.006134 0'0 2016-02-19 08:45:38.006134 Primary PG is in osd.2. Lets do file checksum # md5sum osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 \95818f285434d626ab26255410f9a447 osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 == Fail imitation #1 Lets corrupt backup PG # sed -i -r 's/aaaaaaaaaa/abaaaaaaaa/g' osd/nmz-0/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 # sed -i -r 's/aaaaaaaaaa/acaaaaaaaa/g' osd/nmz-1/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 # md5sum osd/nmz-*/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 \99555c6c3ed07550b5fdfd2411b94fdd osd/nmz-0/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 \8cf7cc66d7f0dc7804fbfef492bcacfd osd/nmz-1/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 \95818f285434d626ab26255410f9a447 osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 lets do scrub to find the corruption # ceph osd scrub 0 7f8732f33700 0 log_channel(cluster) log [INF] : 16.63 scrub starts 7f873072e700 0 log_channel(cluster) log [INF] : 16.63 scrub ok .... 7f8732732700 0 log_channel(cluster) log [INF] : 16.2d scrub starts 7f8734f37700 0 log_channel(cluster) log [INF] : 16.2d scrub ok 7f8730f2f700 0 log_channel(cluster) log [INF] : 16.2b scrub starts 7f8733734700 0 log_channel(cluster) log [INF] : 16.2b scrub ok 7f8731730700 0 log_channel(cluster) log [INF] : 16.2a scrub starts 7f8733f35700 0 log_channel(cluster) log [INF] : 16.2a scrub ok 7f8733f35700 0 log_channel(cluster) log [INF] : 16.25 scrub starts 7f8731730700 0 log_channel(cluster) log [INF] : 16.25 scrub ok 7f8733f35700 0 log_channel(cluster) log [INF] : 16.20 scrub starts 7f8731730700 0 log_channel(cluster) log [INF] : 16.20 scrub ok .... 7f8734f37700 0 log_channel(cluster) log [INF] : 16.0 scrub ok scrub did not touch 16.22 PG. Same with osd.1 # ceph osd deep-scrub 0 same results. scrub vs deep-scrub google? # ceph pg scrub 16.22 instructing pg 16.22 on osd.2 to scrub Only primary PG is checking. So I dont know how to make ceph to check all PG in OSD == Fail imitation #2 Lets change others PG files. Lets make osd.0 to be fine and other corrupted # sed -i -r 's/aaaaaaaaaa/adaaaaaaaa/g' osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 # md5sum osd/nmz-*/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 \95818f285434d626ab26255410f9a447 osd/nmz-0/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 \8cf7cc66d7f0dc7804fbfef492bcacfd osd/nmz-1/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 \852a51b44552ffbb2b0350966c9aa3b2 osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 # ceph osd scrub 2 osd.2 instructed to scrub 7f5e8b686700 0 log_channel(cluster) log [INF] : 16.22 scrub starts 7f5e88e81700 0 log_channel(cluster) log [INF] : 16.22 scrub ok No error detection? # ceph osd deep-scrub 2 osd.2 instructed to deep-scrub 7f5e88e81700 0 log_channel(cluster) log [INF] : 16.22 deep-scrub starts 7f5e8b686700 0 log_channel(cluster) log [INF] : 16.22 deep-scrub ok Still no error detection? Lets check file with md5 # md5sum osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 \852a51b44552ffbb2b0350966c9aa3b2 osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 OSD use cache? Lets restart osd.2 -- After success restart # ceph pg scrub 16.22 instructing pg 16.22 on osd.2 to scrub 7fc475e31700 0 log_channel(cluster) log [INF] : 16.22 scrub starts 7fc478636700 -1 log_channel(cluster) log [ERR] : 16.22 shard 2: soid 16/a7e34aa2/rbd_data.1a72a39011461.0000000000000001/head missing attr _, missing attr snapset 7fc478636700 -1 log_channel(cluster) log [ERR] : 16.22 scrub 0 missing, 1 inconsistent objects 7fc478636700 -1 log_channel(cluster) log [ERR] : 16.22 scrub 1 errors # ceph -s cluster 26fdb24b-9004-4e2b-a8d7-c28f45464084 health HEALTH_ERR 1 pgs inconsistent 1 scrub errors monmap e7: 3 mons at {a=10.10.8.1:6789/0,b=10.10.8.1:6790/0,c=10.10.8.1:6791/0} election epoch 60, quorum 0,1,2 a,b,c osdmap e250: 3 osds: 3 up, 3 in flags sortbitwise pgmap v3172: 100 pgs, 1 pools, 143 MB data, 67 objects 101 MB used, 81818 MB / 81920 MB avail 99 active+clean 1 active+clean+inconsistent No auto health ? # ceph pg repair 16.22 instructing pg 16.22 on osd.2 to repair 7fc475e31700 0 log_channel(cluster) log [INF] : 16.22 repair starts 7fc478636700 -1 log_channel(cluster) log [ERR] : 16.22 shard 2: soid 16/a7e34aa2/rbd_data.1a72a39011461.0000000000000001/head data_digest 0xd444e973 != known data_digest 0xb9b5bcf4 from auth shard 0, missing attr _, missing attr snapset 7fc478636700 -1 log_channel(cluster) log [ERR] : 16.22 repair 0 missing, 1 inconsistent objects 7fc478636700 -1 log_channel(cluster) log [ERR] : 16.22 repair 1 errors, 1 fixed Lets do checksum # md5sum osd/nmz-*/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 \95818f285434d626ab26255410f9a447 osd/nmz-0/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 \8cf7cc66d7f0dc7804fbfef492bcacfd osd/nmz-1/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 \95818f285434d626ab26255410f9a447 osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 Primary PG is fixed but PG in osd.1 is left unchanged. -- tunning Lets change PG primary osd # ceph tell mon.* injectargs -- --mon_osd_allow_primary_temp=true mon.a: injectargs:mon_osd_allow_primary_temp = 'true' mon.b: injectargs:mon_osd_allow_primary_temp = 'true' mon.c: injectargs:mon_osd_allow_primary_temp = 'true' # ceph osd primary-temp 16.22 1 set 16.22 primary_temp mapping to 1 # ceph osd scrub 1 osd.1 instructed to scrub 7f8a909a2700 0 log_channel(cluster) log [INF] : 16.22 scrub starts 7f8a931a7700 0 log_channel(cluster) log [INF] : 16.22 scrub ok No detection # ceph pg scrub 16.22 instructing pg 16.22 on osd.1 to scrub 7f8a931a7700 0 log_channel(cluster) log [INF] : 16.22 scrub starts 7f8a909a2700 0 log_channel(cluster) log [INF] : 16.22 scrub ok Still nothing. Lets check md5 # md5sum osd/nmz-*/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 \95818f285434d626ab26255410f9a447 osd/nmz-0/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 \8cf7cc66d7f0dc7804fbfef492bcacfd osd/nmz-1/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 \852a51b44552ffbb2b0350966c9aa3b2 osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 File is still corrupted. So my questions are: 1. How to make full OSD scrub not part of it. 2. Why scrub do not detect corrupted files? 3. Does Ceph have auto heal option? 4. Does Ceph use some CRC mechanism to detect corrupted bit before return data? _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com