Hello,
I wanted to try out (lab ceph setup) what exactly is going to happen
when parts of data on OSD disk gets corrupted. I created a simple test
where I was going through the block device data until I found something
that resembled user data (using dd and hexdump) (/dev/sdd is a block
device that is used by OSD)
INFRA [root@ceph-vm-lab5 ~]# dd if=/dev/sdd bs=32 count=1 skip=33920 |
hexdump -C
00000000 6e 20 69 64 3d 30 20 65 78 65 3d 22 2f 75 73 72 |n id=0
exe="/usr|
00000010 2f 73 62 69 6e 2f 73 73 68 64 22 20 68 6f 73 74 |/sbin/sshd"
host|
Then I deliberately overwrote 32 bytes using random data:
INFRA [root@ceph-vm-lab5 ~]# dd if=/dev/urandom of=/dev/sdd bs=32
count=1 seek=33920
INFRA [root@ceph-vm-lab5 ~]# dd if=/dev/sdd bs=32 count=1 skip=33920 |
hexdump -C
00000000 25 75 af 3e 87 b0 3b 04 78 ba 79 e3 64 fc 76 d2
|%u.>..;.x.y.d.v.|
00000010 9e 94 00 c2 45 a5 e1 d2 a8 86 f1 25 fc 18 07 5a
|....E......%...Z|
At this point I would expect some sort of data corruption. I restarted
the OSD daemon on this host to make sure it flushes any potentially
buffered data. It restarted OK without noticing anything, which was
expected.
Then I ran
ceph osd scrub 5
ceph osd deep-scrub 5
And waiting for all scheduled scrub operations for all PGs to finish.
No inconsistency was found. No errors reported, scrubs just finished OK,
data are still visibly corrupt via hexdump.
Did I just hit some block of data that WAS used by OSD, but was marked
deleted and therefore no longer used or am I missing something? I would
expect CEPH to detect disk corruption and automatically replace the
invalid data with a valid copy?
I use only replica pools in this lab setup, for RBD and CephFS.
Thanks
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx