Ceph and its failures

Nmz <nemesiz@xxxxxx> · Fri, 19 Feb 2016 21:05:50 +0200

Hello Ceph happy users. Starting this test I want to understand how Ceph can protect my data and what I have to do in some situations.
So let's begin

== Preparation

ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)

Ceph contains
 MON: 3
 OSD: 3

File system: ZFS
Kernel: 4.2.6

Preparing pool

# ceph osd pool create rbd 100
pool 'rbd' created

# ceph osd pool set rbd size 3
set pool 16 size to 3

RBD client

# rbd create test --size 4G
# rbd map test
/dev/rbd0
# mkfs.ext2 /dev/rbd0
# mount /dev/rbd0 /mnt
# printf "aaaaaaaaaa\nbbbbbbbbbb" > /mnt/file

Searching PG for our file

# grep "aaaaaaaaa" * -R
Binary file osd/nmz-0-journal/journal matches
Binary file osd/nmz-1/current/16.22_head/rbd\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 matches
Binary file osd/nmz-2/current/16.22_head/rbd\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 matches
Binary file osd/nmz-1-journal/journal matches
Binary file osd/nmz-0/current/16.22_head/rbd\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 matches
Binary file osd/nmz-2-journal/journal matches

PG info

# ceph pg ls      
pg_stat	objects	mip	degr	misp	unf	bytes	log	disklog	state		state_stamp			v	reported	up	up_primary	acting	acting_primary last_scrub 	scrub_stamp			last_deep_scrub	deep_scrub_stamp
16.22	1	0	0	0	0	8192	2	2	active+clean	2016-02-19 08:46:11.157938	242'2	242:14		[2,1,0]	2		[2,1,0]	2		0'0		2016-02-19 08:45:38.006134	0'0		2016-02-19 08:45:38.006134

Primary PG is in osd.2. Lets do file checksum

# md5sum osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 
\95818f285434d626ab26255410f9a447  osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10

== Fail imitation #1

Lets corrupt backup PG

# sed -i -r 's/aaaaaaaaaa/abaaaaaaaa/g' osd/nmz-0/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 
# sed -i -r 's/aaaaaaaaaa/acaaaaaaaa/g' osd/nmz-1/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10

# md5sum osd/nmz-*/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 
\99555c6c3ed07550b5fdfd2411b94fdd  osd/nmz-0/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\8cf7cc66d7f0dc7804fbfef492bcacfd  osd/nmz-1/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\95818f285434d626ab26255410f9a447  osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10

lets do scrub to find the corruption

# ceph osd scrub 0

7f8732f33700  0 log_channel(cluster) log [INF] : 16.63 scrub starts
7f873072e700  0 log_channel(cluster) log [INF] : 16.63 scrub ok
....
7f8732732700  0 log_channel(cluster) log [INF] : 16.2d scrub starts
7f8734f37700  0 log_channel(cluster) log [INF] : 16.2d scrub ok
7f8730f2f700  0 log_channel(cluster) log [INF] : 16.2b scrub starts
7f8733734700  0 log_channel(cluster) log [INF] : 16.2b scrub ok
7f8731730700  0 log_channel(cluster) log [INF] : 16.2a scrub starts
7f8733f35700  0 log_channel(cluster) log [INF] : 16.2a scrub ok
7f8733f35700  0 log_channel(cluster) log [INF] : 16.25 scrub starts
7f8731730700  0 log_channel(cluster) log [INF] : 16.25 scrub ok
7f8733f35700  0 log_channel(cluster) log [INF] : 16.20 scrub starts
7f8731730700  0 log_channel(cluster) log [INF] : 16.20 scrub ok
....
7f8734f37700  0 log_channel(cluster) log [INF] : 16.0 scrub ok

scrub did not touch 16.22 PG. Same with osd.1

# ceph osd deep-scrub 0

same results. scrub vs deep-scrub google?

# ceph pg scrub 16.22
instructing pg 16.22 on osd.2 to scrub

Only primary PG is checking.

So I dont know how to make ceph to check all PG in OSD

== Fail imitation #2

Lets change others PG files. Lets make osd.0 to be fine and other corrupted

# sed -i -r 's/aaaaaaaaaa/adaaaaaaaa/g' osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 

# md5sum osd/nmz-*/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 
\95818f285434d626ab26255410f9a447  osd/nmz-0/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\8cf7cc66d7f0dc7804fbfef492bcacfd  osd/nmz-1/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\852a51b44552ffbb2b0350966c9aa3b2  osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10

# ceph osd scrub 2
osd.2 instructed to scrub

7f5e8b686700  0 log_channel(cluster) log [INF] : 16.22 scrub starts
7f5e88e81700  0 log_channel(cluster) log [INF] : 16.22 scrub ok

No error detection?

# ceph osd deep-scrub 2
osd.2 instructed to deep-scrub

7f5e88e81700  0 log_channel(cluster) log [INF] : 16.22 deep-scrub starts
7f5e8b686700  0 log_channel(cluster) log [INF] : 16.22 deep-scrub ok

Still no error detection? Lets check file with md5

# md5sum osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 
\852a51b44552ffbb2b0350966c9aa3b2  osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10

OSD use cache? Lets restart osd.2

-- After success restart

# ceph pg scrub 16.22
instructing pg 16.22 on osd.2 to scrub

7fc475e31700  0 log_channel(cluster) log [INF] : 16.22 scrub starts
7fc478636700 -1 log_channel(cluster) log [ERR] : 16.22 shard 2: soid 16/a7e34aa2/rbd_data.1a72a39011461.0000000000000001/head missing attr _, missing attr snapset
7fc478636700 -1 log_channel(cluster) log [ERR] : 16.22 scrub 0 missing, 1 inconsistent objects
7fc478636700 -1 log_channel(cluster) log [ERR] : 16.22 scrub 1 errors

# ceph -s
    cluster 26fdb24b-9004-4e2b-a8d7-c28f45464084
     health HEALTH_ERR
            1 pgs inconsistent
            1 scrub errors
     monmap e7: 3 mons at {a=10.10.8.1:6789/0,b=10.10.8.1:6790/0,c=10.10.8.1:6791/0}
            election epoch 60, quorum 0,1,2 a,b,c
     osdmap e250: 3 osds: 3 up, 3 in
            flags sortbitwise
      pgmap v3172: 100 pgs, 1 pools, 143 MB data, 67 objects
            101 MB used, 81818 MB / 81920 MB avail
                  99 active+clean
                   1 active+clean+inconsistent

No auto health ?

# ceph pg repair 16.22
instructing pg 16.22 on osd.2 to repair

7fc475e31700  0 log_channel(cluster) log [INF] : 16.22 repair starts
7fc478636700 -1 log_channel(cluster) log [ERR] : 16.22 shard 2: soid 16/a7e34aa2/rbd_data.1a72a39011461.0000000000000001/head data_digest 0xd444e973 != known data_digest 0xb9b5bcf4 from auth shard 0, missing attr _, missing attr snapset
7fc478636700 -1 log_channel(cluster) log [ERR] : 16.22 repair 0 missing, 1 inconsistent objects
7fc478636700 -1 log_channel(cluster) log [ERR] : 16.22 repair 1 errors, 1 fixed

Lets do checksum

# md5sum osd/nmz-*/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 
\95818f285434d626ab26255410f9a447  osd/nmz-0/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\8cf7cc66d7f0dc7804fbfef492bcacfd  osd/nmz-1/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\95818f285434d626ab26255410f9a447  osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10

Primary PG is fixed but PG in osd.1 is left unchanged.

-- tunning

Lets change PG primary osd

# ceph tell mon.*  injectargs -- --mon_osd_allow_primary_temp=true 
mon.a: injectargs:mon_osd_allow_primary_temp = 'true' 
mon.b: injectargs:mon_osd_allow_primary_temp = 'true' 
mon.c: injectargs:mon_osd_allow_primary_temp = 'true'

# ceph osd primary-temp 16.22 1
set 16.22 primary_temp mapping to 1

# ceph osd scrub 1
osd.1 instructed to scrub

7f8a909a2700  0 log_channel(cluster) log [INF] : 16.22 scrub starts
7f8a931a7700  0 log_channel(cluster) log [INF] : 16.22 scrub ok

No detection

# ceph pg scrub 16.22
instructing pg 16.22 on osd.1 to scrub

7f8a931a7700  0 log_channel(cluster) log [INF] : 16.22 scrub starts
7f8a909a2700  0 log_channel(cluster) log [INF] : 16.22 scrub ok

Still nothing. Lets check md5

# md5sum osd/nmz-*/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10 
\95818f285434d626ab26255410f9a447  osd/nmz-0/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\8cf7cc66d7f0dc7804fbfef492bcacfd  osd/nmz-1/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\852a51b44552ffbb2b0350966c9aa3b2  osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10

File is still corrupted. 

So my questions are:

1. How to make full OSD scrub not part of it.
2. Why scrub do not detect corrupted files?
3. Does Ceph have auto heal option?
4. Does Ceph use some CRC mechanism to detect corrupted bit before return data?

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com