Hi Greg, I believe Marc is referring to the corruption triggered by set_extsize on xfs. That option was disabled by default in 0.80.4... See the thread "firefly scrub error". Cheers, Dan From: Gregory Farnum <greg@xxxxxxxxxxx> Sent: Sep 16, 2014 8:15 PM To: Marc Cc: ceph-users at lists.ceph.com Subject: Re: Still seing scrub errors in .80.5 On Tue, Sep 16, 2014 at 12:03 AM, Marc <mail at shoowin.de> wrote: > Hello fellow cephalopods, > > every deep scrub seems to dig up inconsistencies (i.e. scrub errors) > that we could use some help with diagnosing. > > I understand there used to be a data corruption issue before .80.3 so we > made sure that all the nodes were upgraded to .80.5 and all the daemons > were restarted (they all report .80.5 when contacted via socket). > *After* that we ran a deep scrub, which obviously found errors, which we > then repaired. But unfortunately, it's now a week later, and the next > deep scrub has dug up new errors, which shouldn't have happened I think...? > > ceph.log shows these errors in between the deep scrub messages: > > 2014-09-15 07:56:23.164818 osd.15 10.10.10.55:6804/23853 364 : [ERR] > 3.335 shard 2: soid > 6ba68735/rbd_data.59e3c2ae8944a.00000000000006b1/head//3 digest > 3090820441 != known digest 3787996302 > 2014-09-15 07:56:23.164827 osd.15 10.10.10.55:6804/23853 365 : [ERR] > 3.335 shard 6: soid > 6ba68735/rbd_data.59e3c2ae8944a.00000000000006b1/head//3 digest > 3259686791 != known digest 3787996302 > 2014-09-15 07:56:28.485713 osd.15 10.10.10.55:6804/23853 366 : [ERR] > 3.335 deep-scrub 0 missing, 1 inconsistent objects > 2014-09-15 07:56:28.485734 osd.15 10.10.10.55:6804/23853 367 : [ERR] > 3.335 deep-scrub 2 errors Uh, I'm afraid those errors were never output as a result of bugs in Firefly. These are indicating actual data differences between the nodes, whereas the Firefly issue was a metadata flag that wasn't handled properly in mixed-version OSD clusters. I don't think Ceph has ever had a bug that would change the data payload between OSDs. Searching the tracker logs, the only entries with this error message are: 1) The local filesystem is not misbehaving under the workload we give it (and there are no known filesystem issues that are exposed by running firefly OSDs in default config that I can think of ? certainly none with this error) 2) The disks themselves are bad. :/ -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users at lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140916/3503eecc/attachment-0001.htm>