You are right - missing xattrs are leading to ENOENT. Corrupting the file without removing xattrs leading to i/o error without marking PG as inconsistent. Created an issue: http://tracker.ceph.com/issues/20863
Hi!
Just found strange thing while testing deep-scrub on 10.2.7. 1. Stop OSD 2. Change primary copy's contents (using vi) 3. Start OSD
Then 'rados get' returns "No such file or directory". No error messages seen in OSD log, cluster status "HEALTH_OK".
4. ceph pg repair <num>
Then 'rados get' works as expected, "currupted" data repaired.
One time (I was unable to reproduce this) the error was detected on-fly (without OSD restart):
2017-07-28 17:34:22.362968 7ff8bfa27700 -1 log_channel(cluster) log [ERR] : 16.d full-object read crc 0x78fcc738 != expected 0x5fd86d3e on 16:b36845b2:::testobject1:head
Am I missed that CRC storing/verifying started to work on XFS? If so, where the are stored? xattr? I thought it was only implemented in Bluestore.
FileStore maintains CRC checksums opportunistically, such as when you do a full-object write. So in some circumstances it can detect objects with the wrong data and do repairs on its own. (And the checksum is stored in the object_info, which is written down in an xattr, yes.)
I'm not certain why overwriting the file with vi made it return ENOENT, but probably because it lost the xattrs storing metadata. (...though I'd expect that to return an error on the primary that either prompts it to repair, or else incorrectly returns that raw error to the client. Can you create a ticket with exactly what steps you followed and what outcome you saw?) -Greg
-- Dmitry Glushenok Jet Infosystems
|
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com