scrub error on firefly

sweil@xxxxxxxxxx (Sage Weil) · Fri, 11 Jul 2014 10:37:57 -0700 (PDT)

One other thing we might also try is catching this earlier (on first read 
of corrupt data) instead of waiting for scrub.  If you are not super 
performance sensitive, you can add

 filestore sloppy crc = true
 filestore sloppy crc block size = 524288

That will track and verify CRCs on any large (>512k) writes.  Smaller 
block sizes will give more precision and more checks, but will generate 
larger xattrs and have a bigger impact on performance...

sage

On Fri, 11 Jul 2014, Samuel Just wrote:

> When you get the next inconsistency, can you copy the actual objects
> from the osd store trees and get them to us?  That might provide a
> clue.
> -Sam
> 
> On Fri, Jul 11, 2014 at 6:52 AM, Randy Smith <rbsmith at adams.edu> wrote:
> >
> >
> >
> > On Thu, Jul 10, 2014 at 4:40 PM, Samuel Just <sam.just at inktank.com> wrote:
> >>
> >> It could be an indication of a problem on osd 5, but the timing is
> >> worrying.  Can you attach your ceph.conf?
> >
> >
> > Attached.
> >
> >>
> >> Have there been any osds
> >> going down, new osds added, anything to cause recovery?
> >
> >
> > I upgraded to firefly last week. As part of the upgrade I, obviously, had to
> > restart every osd. Also, I attempted to switch to the optimal tunables but
> > doing so degraded 27% of my cluster and made most of my VMs unresponsive. I
> > switched back to the legacy tunables and everything was happy again. Both of
> > those operations, of course, caused recoveries. I have made no changes since
> > then.
> >
> >>
> >>  Anything in
> >> dmesg to indicate an fs problem?
> >
> >
> > Nothing. The system went inconsistent again this morning, again on the same
> > rbd but different osds this time.
> >
> > 2014-07-11 05:48:12.857657 osd.1 192.168.253.77:6801/12608 904 : [ERR] 3.76
> > shard 1: soid 1280076/rb.0.b0ce3.238e1f29.00000000025c/head//3 digest
> > 2198242284 != known digest 3879754377
> > 2014-07-11 05:49:29.020024 osd.1 192.168.253.77:6801/12608 905 : [ERR] 3.76
> > deep-scrub 0 missing, 1 inconsistent objects
> > 2014-07-11 05:49:29.020029 osd.1 192.168.253.77:6801/12608 906 : [ERR] 3.76
> > deep-scrub 1 errors
> >
> > $ ceph health detail
> > HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
> > pg 3.76 is active+clean+inconsistent, acting [1,2]
> > 1 scrub errors
> >
> >
> >>
> >>  Have you recently changed any
> >> settings?
> >
> >
> > I upgraded from bobtail to dumpling to firefly.
> >
> >>
> >> -Sam
> >>
> >> On Thu, Jul 10, 2014 at 2:58 PM, Randy Smith <rbsmith at adams.edu> wrote:
> >> > Greetings,
> >> >
> >> > Just a follow up on my original issue. =ceph pg repair ...= fixed the
> >> > problem. However, today I got another inconsistent pg. It's interesting
> >> > to
> >> > me that this second error is in the same rbd image and appears to be
> >> > "close"
> >> > to the previously inconsistent pg. (Even more fun, osd.5 was the
> >> > secondary
> >> > in the first error and is the primary here though the other osd is
> >> > different.)
> >> >
> >> > Is this indicative of a problem on osd.5 or perhaps a clue into what's
> >> > causing firefly to be so inconsistent?
> >> >
> >> > The relevant log entries are below.
> >> >
> >> > 2014-07-07 18:50:48.646407 osd.2 192.168.253.70:6801/56987 163 : [ERR]
> >> > 3.c6
> >> > shard 2: soid 34dc35c6/rb.0.b0ce3.238e1f29.00000000000b/head//3 digest
> >> > 2256074002 != known digest 3998068918
> >> > 2014-07-07 18:51:36.936076 osd.2 192.168.253.70:6801/56987 164 : [ERR]
> >> > 3.c6
> >> > deep-scrub 0 missing, 1 inconsistent objects
> >> > 2014-07-07 18:51:36.936082 osd.2 192.168.253.70:6801/56987 165 : [ERR]
> >> > 3.c6
> >> > deep-scrub 1 errors
> >> >
> >> >
> >> > 2014-07-10 15:38:53.990328 osd.5 192.168.253.81:6800/10013 257 : [ERR]
> >> > 3.41
> >> > shard 1: soid e183cc41/rb.0.b0ce3.238e1f29.00000000024c/head//3 digest
> >> > 3224286363 != known digest 3409342281
> >> > 2014-07-10 15:39:11.701276 osd.5 192.168.253.81:6800/10013 258 : [ERR]
> >> > 3.41
> >> > deep-scrub 0 missing, 1 inconsistent objects
> >> > 2014-07-10 15:39:11.701281 osd.5 192.168.253.81:6800/10013 259 : [ERR]
> >> > 3.41
> >> > deep-scrub 1 errors
> >> >
> >> >
> >> >
> >> > On Thu, Jul 10, 2014 at 12:05 PM, Chahal, Sudip <sudip.chahal at intel.com>
> >> > wrote:
> >> >>
> >> >> Thanks - so it appears that the advantage of the 3rd replica (relative
> >> >> to
> >> >> 2 replicas) has to do much more with recovering from two concurrent OSD
> >> >> failures than with inconsistencies found during deep scrub - would you
> >> >> agree?
> >> >>
> >> >> Re: repair - do you mean the "repair" process during deep scrub  - if
> >> >> yes,
> >> >> this is automatic - correct?
> >> >>     Or
> >> >> Are you referring to the explicit manually initiated repair commands?
> >> >>
> >> >> Thanks,
> >> >>
> >> >> -Sudip
> >> >>
> >> >> -----Original Message-----
> >> >> From: Samuel Just [mailto:sam.just at inktank.com]
> >> >> Sent: Thursday, July 10, 2014 10:50 AM
> >> >> To: Chahal, Sudip
> >> >> Cc: Christian Eichelmann; ceph-users at lists.ceph.com
> >> >> Subject: Re: [ceph-users] scrub error on firefly
> >> >>
> >> >> Repair I think will tend to choose the copy with the lowest osd number
> >> >> which is not obviously corrupted.  Even with three replicas, it does
> >> >> not do
> >> >> any kind of voting at this time.
> >> >> -Sam
> >> >>
> >> >> On Thu, Jul 10, 2014 at 10:39 AM, Chahal, Sudip
> >> >> <sudip.chahal at intel.com>
> >> >> wrote:
> >> >> > I've a basic related question re: Firefly operation - would
> >> >> > appreciate
> >> >> > any insights:
> >> >> >
> >> >> > With three replicas, if checksum inconsistencies across replicas are
> >> >> > found during deep-scrub then:
> >> >> >         a.  does the majority win or is the primary always the winner
> >> >> > and used to overwrite the secondaries
> >> >> >                 b. is this reconciliation done automatically during
> >> >> > deep-scrub or does each reconciliation have to be executed manually
> >> >> > by the
> >> >> > administrator?
> >> >> >
> >> >> > With 2 replicas - how are things different (if at all):
> >> >> >                a. The primary is declared the winner - correct?
> >> >> >                b. is this reconciliation done automatically during
> >> >> > deep-scrub or does it have to be done "manually" because there is no
> >> >> > majority?
> >> >> >
> >> >> > Thanks,
> >> >> >
> >> >> > -Sudip
> >> >> >
> >> >> >
> >> >> > -----Original Message-----
> >> >> > From: ceph-users [mailto:ceph-users-bounces at lists.ceph.com] On Behalf
> >> >> > Of Samuel Just
> >> >> > Sent: Thursday, July 10, 2014 10:16 AM
> >> >> > To: Christian Eichelmann
> >> >> > Cc: ceph-users at lists.ceph.com
> >> >> > Subject: Re: [ceph-users] scrub error on firefly
> >> >> >
> >> >> > Can you attach your ceph.conf for your osds?
> >> >> > -Sam
> >> >> >
> >> >> > On Thu, Jul 10, 2014 at 8:01 AM, Christian Eichelmann
> >> >> > <christian.eichelmann at 1und1.de> wrote:
> >> >> >> I can also confirm that after upgrading to firefly both of our
> >> >> >> clusters (test and live) were going from 0 scrub errors each for
> >> >> >> about
> >> >> >> 6 Month to about 9-12 per week...
> >> >> >> This also makes me kind of nervous, since as far as I know
> >> >> >> everything
> >> >> >> "ceph pg repair" does, is to copy the primary object to all
> >> >> >> replicas,
> >> >> >> no matter which object is the correct one.
> >> >> >> Of course the described method of manual checking works (for pools
> >> >> >> with more than 2 replicas), but doing this in a large cluster nearly
> >> >> >> every week is horribly timeconsuming and error prone.
> >> >> >> It would be great to get an explanation for the increased numbers of
> >> >> >> scrub errors since firefly. Were they just not detected correctly in
> >> >> >> previous versions? Or is there maybe something wrong with the new
> >> >> >> code?
> >> >> >>
> >> >> >> Acutally, our company is currently preventing our projects to move
> >> >> >> to
> >> >> >> ceph because of this problem.
> >> >> >>
> >> >> >> Regards,
> >> >> >> Christian
> >> >> >> ________________________________
> >> >> >> Von: ceph-users [ceph-users-bounces at lists.ceph.com]" im Auftrag von
> >> >> >> "Travis Rhoden [trhoden at gmail.com]
> >> >> >> Gesendet: Donnerstag, 10. Juli 2014 16:24
> >> >> >> An: Gregory Farnum
> >> >> >> Cc: ceph-users at lists.ceph.com
> >> >> >> Betreff: Re: [ceph-users] scrub error on firefly
> >> >> >>
> >> >> >> And actually just to follow-up, it does seem like there are some
> >> >> >> additional smarts beyond just using the primary to overwrite the
> >> >> >> secondaries...  Since I captured md5 sums before and after the
> >> >> >> repair, I can say that in this particular instance, the secondary
> >> >> >> copy
> >> >> >> was used to overwrite the primary.
> >> >> >> So, I'm just trusting Ceph to the right thing, and so far it seems
> >> >> >> to, but the comments here about needing to determine the correct
> >> >> >> object and place it on the primary PG make me wonder if I've been
> >> >> >> missing something.
> >> >> >>
> >> >> >>  - Travis
> >> >> >>
> >> >> >>
> >> >> >> On Thu, Jul 10, 2014 at 10:19 AM, Travis Rhoden <trhoden at gmail.com>
> >> >> >> wrote:
> >> >> >>>
> >> >> >>> I can also say that after a recent upgrade to Firefly, I have
> >> >> >>> experienced massive uptick in scrub errors.  The cluster was on
> >> >> >>> cuttlefish for about a year, and had maybe one or two scrub errors.
> >> >> >>> After upgrading to Firefly, we've probably seen 3 to 4 dozen in the
> >> >> >>> last month or so (was getting 2-3 a day for a few weeks until the
> >> >> >>> whole cluster was rescrubbed, it seemed).
> >> >> >>>
> >> >> >>> What I cannot determine, however, is how to know which object is
> >> >> >>> busted?
> >> >> >>> For example, just today I ran into a scrub error.  The object has
> >> >> >>> two copies and is an 8MB piece of an RBD, and has identical
> >> >> >>> timestamps, identical xattrs names and values.  But it definitely
> >> >> >>> has a different
> >> >> >>> MD5 sum. How to know which one is correct?
> >> >> >>>
> >> >> >>> I've been just kicking off pg repair each time, which seems to just
> >> >> >>> use the primary copy to overwrite the others.  Haven't run into any
> >> >> >>> issues with that so far, but it does make me nervous.
> >> >> >>>
> >> >> >>>  - Travis
> >> >> >>>
> >> >> >>>
> >> >> >>> On Tue, Jul 8, 2014 at 1:06 AM, Gregory Farnum <greg at inktank.com>
> >> >> >>> wrote:
> >> >> >>>>
> >> >> >>>> It's not very intuitive or easy to look at right now (there are
> >> >> >>>> plans from the recent developer summit to improve things), but the
> >> >> >>>> central log should have output about exactly what objects are
> >> >> >>>> busted. You'll then want to compare the copies manually to
> >> >> >>>> determine which ones are good or bad, get the good copy on the
> >> >> >>>> primary (make sure you preserve xattrs), and run repair.
> >> >> >>>> -Greg
> >> >> >>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> On Mon, Jul 7, 2014 at 6:48 PM, Randy Smith <rbsmith at adams.edu>
> >> >> >>>> wrote:
> >> >> >>>> > Greetings,
> >> >> >>>> >
> >> >> >>>> > I upgraded to firefly last week and I suddenly received this
> >> >> >>>> > error:
> >> >> >>>> >
> >> >> >>>> > health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
> >> >> >>>> >
> >> >> >>>> > ceph health detail shows the following:
> >> >> >>>> >
> >> >> >>>> > HEALTH_ERR 1 pgs inconsistent; 1 scrub errors pg 3.c6 is
> >> >> >>>> > active+clean+inconsistent, acting [2,5]
> >> >> >>>> > 1 scrub errors
> >> >> >>>> >
> >> >> >>>> > The docs say that I can run `ceph pg repair 3.c6` to fix this.
> >> >> >>>> > What I want to know is what are the risks of data loss if I run
> >> >> >>>> > that command in this state and how can I mitigate them?
> >> >> >>>> >
> >> >> >>>> > --
> >> >> >>>> > Randall Smith
> >> >> >>>> > Computing Services
> >> >> >>>> > Adams State University
> >> >> >>>> > http://www.adams.edu/
> >> >> >>>> > 719-587-7741
> >> >> >>>> >
> >> >> >>>> > _______________________________________________
> >> >> >>>> > ceph-users mailing list
> >> >> >>>> > ceph-users at lists.ceph.com
> >> >> >>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> >>>> >
> >> >> >>>> _______________________________________________
> >> >> >>>> ceph-users mailing list
> >> >> >>>> ceph-users at lists.ceph.com
> >> >> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> >>>
> >> >> >>>
> >> >> >>
> >> >> >>
> >> >> >> _______________________________________________
> >> >> >> ceph-users mailing list
> >> >> >> ceph-users at lists.ceph.com
> >> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> >>
> >> >> > _______________________________________________
> >> >> > ceph-users mailing list
> >> >> > ceph-users at lists.ceph.com
> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> _______________________________________________
> >> >> ceph-users mailing list
> >> >> ceph-users at lists.ceph.com
> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Randall Smith
> >> > Computing Services
> >> > Adams State University
> >> > http://www.adams.edu/
> >> > 719-587-7741
> >> >
> >> > _______________________________________________
> >> > ceph-users mailing list
> >> > ceph-users at lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >
> >
> >
> >
> > --
> > Randall Smith
> > Computing Services
> > Adams State University
> > http://www.adams.edu/
> > 719-587-7741
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
>