scrub error on firefly

sam.just@xxxxxxxxxxx (Samuel Just) · Fri, 11 Jul 2014 14:39:21 -0700



And grab the xattrs as well.
-Sam

On Fri, Jul 11, 2014 at 2:39 PM, Samuel Just <sam.just at inktank.com> wrote:
> Right.
> -Sam
>
> On Fri, Jul 11, 2014 at 2:05 PM, Randy Smith <rbsmith at adams.edu> wrote:
>> Greetings,
>>
>> I'm using xfs.
>>
>> Also, when, in a previous email, you asked if I could send the object, do
>> you mean the files from each server named something like this:
>> ./3.c6_head/DIR_6/DIR_C/DIR_5/rb.0.b0ce3.238e1f29.00000000000b__head_34DC35C6__3
>> ?
>>
>>
>> On Fri, Jul 11, 2014 at 2:00 PM, Samuel Just <sam.just at inktank.com> wrote:
>>>
>>> Also, what filesystem are you using?
>>> -Sam
>>>
>>> On Fri, Jul 11, 2014 at 10:37 AM, Sage Weil <sweil at redhat.com> wrote:
>>> > One other thing we might also try is catching this earlier (on first
>>> > read
>>> > of corrupt data) instead of waiting for scrub.  If you are not super
>>> > performance sensitive, you can add
>>> >
>>> >  filestore sloppy crc = true
>>> >  filestore sloppy crc block size = 524288
>>> >
>>> > That will track and verify CRCs on any large (>512k) writes.  Smaller
>>> > block sizes will give more precision and more checks, but will generate
>>> > larger xattrs and have a bigger impact on performance...
>>> >
>>> > sage
>>> >
>>> >
>>> > On Fri, 11 Jul 2014, Samuel Just wrote:
>>> >
>>> >> When you get the next inconsistency, can you copy the actual objects
>>> >> from the osd store trees and get them to us?  That might provide a
>>> >> clue.
>>> >> -Sam
>>> >>
>>> >> On Fri, Jul 11, 2014 at 6:52 AM, Randy Smith <rbsmith at adams.edu> wrote:
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Thu, Jul 10, 2014 at 4:40 PM, Samuel Just <sam.just at inktank.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> It could be an indication of a problem on osd 5, but the timing is
>>> >> >> worrying.  Can you attach your ceph.conf?
>>> >> >
>>> >> >
>>> >> > Attached.
>>> >> >
>>> >> >>
>>> >> >> Have there been any osds
>>> >> >> going down, new osds added, anything to cause recovery?
>>> >> >
>>> >> >
>>> >> > I upgraded to firefly last week. As part of the upgrade I, obviously,
>>> >> > had to
>>> >> > restart every osd. Also, I attempted to switch to the optimal
>>> >> > tunables but
>>> >> > doing so degraded 27% of my cluster and made most of my VMs
>>> >> > unresponsive. I
>>> >> > switched back to the legacy tunables and everything was happy again.
>>> >> > Both of
>>> >> > those operations, of course, caused recoveries. I have made no
>>> >> > changes since
>>> >> > then.
>>> >> >
>>> >> >>
>>> >> >>  Anything in
>>> >> >> dmesg to indicate an fs problem?
>>> >> >
>>> >> >
>>> >> > Nothing. The system went inconsistent again this morning, again on
>>> >> > the same
>>> >> > rbd but different osds this time.
>>> >> >
>>> >> > 2014-07-11 05:48:12.857657 osd.1 192.168.253.77:6801/12608 904 :
>>> >> > [ERR] 3.76
>>> >> > shard 1: soid 1280076/rb.0.b0ce3.238e1f29.00000000025c/head//3 digest
>>> >> > 2198242284 != known digest 3879754377
>>> >> > 2014-07-11 05:49:29.020024 osd.1 192.168.253.77:6801/12608 905 :
>>> >> > [ERR] 3.76
>>> >> > deep-scrub 0 missing, 1 inconsistent objects
>>> >> > 2014-07-11 05:49:29.020029 osd.1 192.168.253.77:6801/12608 906 :
>>> >> > [ERR] 3.76
>>> >> > deep-scrub 1 errors
>>> >> >
>>> >> > $ ceph health detail
>>> >> > HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
>>> >> > pg 3.76 is active+clean+inconsistent, acting [1,2]
>>> >> > 1 scrub errors
>>> >> >
>>> >> >
>>> >> >>
>>> >> >>  Have you recently changed any
>>> >> >> settings?
>>> >> >
>>> >> >
>>> >> > I upgraded from bobtail to dumpling to firefly.
>>> >> >
>>> >> >>
>>> >> >> -Sam
>>> >> >>
>>> >> >> On Thu, Jul 10, 2014 at 2:58 PM, Randy Smith <rbsmith at adams.edu>
>>> >> >> wrote:
>>> >> >> > Greetings,
>>> >> >> >
>>> >> >> > Just a follow up on my original issue. =ceph pg repair ...= fixed
>>> >> >> > the
>>> >> >> > problem. However, today I got another inconsistent pg. It's
>>> >> >> > interesting
>>> >> >> > to
>>> >> >> > me that this second error is in the same rbd image and appears to
>>> >> >> > be
>>> >> >> > "close"
>>> >> >> > to the previously inconsistent pg. (Even more fun, osd.5 was the
>>> >> >> > secondary
>>> >> >> > in the first error and is the primary here though the other osd is
>>> >> >> > different.)
>>> >> >> >
>>> >> >> > Is this indicative of a problem on osd.5 or perhaps a clue into
>>> >> >> > what's
>>> >> >> > causing firefly to be so inconsistent?
>>> >> >> >
>>> >> >> > The relevant log entries are below.
>>> >> >> >
>>> >> >> > 2014-07-07 18:50:48.646407 osd.2 192.168.253.70:6801/56987 163 :
>>> >> >> > [ERR]
>>> >> >> > 3.c6
>>> >> >> > shard 2: soid 34dc35c6/rb.0.b0ce3.238e1f29.00000000000b/head//3
>>> >> >> > digest
>>> >> >> > 2256074002 != known digest 3998068918
>>> >> >> > 2014-07-07 18:51:36.936076 osd.2 192.168.253.70:6801/56987 164 :
>>> >> >> > [ERR]
>>> >> >> > 3.c6
>>> >> >> > deep-scrub 0 missing, 1 inconsistent objects
>>> >> >> > 2014-07-07 18:51:36.936082 osd.2 192.168.253.70:6801/56987 165 :
>>> >> >> > [ERR]
>>> >> >> > 3.c6
>>> >> >> > deep-scrub 1 errors
>>> >> >> >
>>> >> >> >
>>> >> >> > 2014-07-10 15:38:53.990328 osd.5 192.168.253.81:6800/10013 257 :
>>> >> >> > [ERR]
>>> >> >> > 3.41
>>> >> >> > shard 1: soid e183cc41/rb.0.b0ce3.238e1f29.00000000024c/head//3
>>> >> >> > digest
>>> >> >> > 3224286363 != known digest 3409342281
>>> >> >> > 2014-07-10 15:39:11.701276 osd.5 192.168.253.81:6800/10013 258 :
>>> >> >> > [ERR]
>>> >> >> > 3.41
>>> >> >> > deep-scrub 0 missing, 1 inconsistent objects
>>> >> >> > 2014-07-10 15:39:11.701281 osd.5 192.168.253.81:6800/10013 259 :
>>> >> >> > [ERR]
>>> >> >> > 3.41
>>> >> >> > deep-scrub 1 errors
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > On Thu, Jul 10, 2014 at 12:05 PM, Chahal, Sudip
>>> >> >> > <sudip.chahal at intel.com>
>>> >> >> > wrote:
>>> >> >> >>
>>> >> >> >> Thanks - so it appears that the advantage of the 3rd replica
>>> >> >> >> (relative
>>> >> >> >> to
>>> >> >> >> 2 replicas) has to do much more with recovering from two
>>> >> >> >> concurrent OSD
>>> >> >> >> failures than with inconsistencies found during deep scrub -
>>> >> >> >> would you
>>> >> >> >> agree?
>>> >> >> >>
>>> >> >> >> Re: repair - do you mean the "repair" process during deep scrub
>>> >> >> >> - if
>>> >> >> >> yes,
>>> >> >> >> this is automatic - correct?
>>> >> >> >>     Or
>>> >> >> >> Are you referring to the explicit manually initiated repair
>>> >> >> >> commands?
>>> >> >> >>
>>> >> >> >> Thanks,
>>> >> >> >>
>>> >> >> >> -Sudip
>>> >> >> >>
>>> >> >> >> -----Original Message-----
>>> >> >> >> From: Samuel Just [mailto:sam.just at inktank.com]
>>> >> >> >> Sent: Thursday, July 10, 2014 10:50 AM
>>> >> >> >> To: Chahal, Sudip
>>> >> >> >> Cc: Christian Eichelmann; ceph-users at lists.ceph.com
>>> >> >> >> Subject: Re: [ceph-users] scrub error on firefly
>>> >> >> >>
>>> >> >> >> Repair I think will tend to choose the copy with the lowest osd
>>> >> >> >> number
>>> >> >> >> which is not obviously corrupted.  Even with three replicas, it
>>> >> >> >> does
>>> >> >> >> not do
>>> >> >> >> any kind of voting at this time.
>>> >> >> >> -Sam
>>> >> >> >>
>>> >> >> >> On Thu, Jul 10, 2014 at 10:39 AM, Chahal, Sudip
>>> >> >> >> <sudip.chahal at intel.com>
>>> >> >> >> wrote:
>>> >> >> >> > I've a basic related question re: Firefly operation - would
>>> >> >> >> > appreciate
>>> >> >> >> > any insights:
>>> >> >> >> >
>>> >> >> >> > With three replicas, if checksum inconsistencies across
>>> >> >> >> > replicas are
>>> >> >> >> > found during deep-scrub then:
>>> >> >> >> >         a.  does the majority win or is the primary always the
>>> >> >> >> > winner
>>> >> >> >> > and used to overwrite the secondaries
>>> >> >> >> >                 b. is this reconciliation done automatically
>>> >> >> >> > during
>>> >> >> >> > deep-scrub or does each reconciliation have to be executed
>>> >> >> >> > manually
>>> >> >> >> > by the
>>> >> >> >> > administrator?
>>> >> >> >> >
>>> >> >> >> > With 2 replicas - how are things different (if at all):
>>> >> >> >> >                a. The primary is declared the winner - correct?
>>> >> >> >> >                b. is this reconciliation done automatically
>>> >> >> >> > during
>>> >> >> >> > deep-scrub or does it have to be done "manually" because there
>>> >> >> >> > is no
>>> >> >> >> > majority?
>>> >> >> >> >
>>> >> >> >> > Thanks,
>>> >> >> >> >
>>> >> >> >> > -Sudip
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > -----Original Message-----
>>> >> >> >> > From: ceph-users [mailto:ceph-users-bounces at lists.ceph.com] On
>>> >> >> >> > Behalf
>>> >> >> >> > Of Samuel Just
>>> >> >> >> > Sent: Thursday, July 10, 2014 10:16 AM
>>> >> >> >> > To: Christian Eichelmann
>>> >> >> >> > Cc: ceph-users at lists.ceph.com
>>> >> >> >> > Subject: Re: [ceph-users] scrub error on firefly
>>> >> >> >> >
>>> >> >> >> > Can you attach your ceph.conf for your osds?
>>> >> >> >> > -Sam
>>> >> >> >> >
>>> >> >> >> > On Thu, Jul 10, 2014 at 8:01 AM, Christian Eichelmann
>>> >> >> >> > <christian.eichelmann at 1und1.de> wrote:
>>> >> >> >> >> I can also confirm that after upgrading to firefly both of our
>>> >> >> >> >> clusters (test and live) were going from 0 scrub errors each
>>> >> >> >> >> for
>>> >> >> >> >> about
>>> >> >> >> >> 6 Month to about 9-12 per week...
>>> >> >> >> >> This also makes me kind of nervous, since as far as I know
>>> >> >> >> >> everything
>>> >> >> >> >> "ceph pg repair" does, is to copy the primary object to all
>>> >> >> >> >> replicas,
>>> >> >> >> >> no matter which object is the correct one.
>>> >> >> >> >> Of course the described method of manual checking works (for
>>> >> >> >> >> pools
>>> >> >> >> >> with more than 2 replicas), but doing this in a large cluster
>>> >> >> >> >> nearly
>>> >> >> >> >> every week is horribly timeconsuming and error prone.
>>> >> >> >> >> It would be great to get an explanation for the increased
>>> >> >> >> >> numbers of
>>> >> >> >> >> scrub errors since firefly. Were they just not detected
>>> >> >> >> >> correctly in
>>> >> >> >> >> previous versions? Or is there maybe something wrong with the
>>> >> >> >> >> new
>>> >> >> >> >> code?
>>> >> >> >> >>
>>> >> >> >> >> Acutally, our company is currently preventing our projects to
>>> >> >> >> >> move
>>> >> >> >> >> to
>>> >> >> >> >> ceph because of this problem.
>>> >> >> >> >>
>>> >> >> >> >> Regards,
>>> >> >> >> >> Christian
>>> >> >> >> >> ________________________________
>>> >> >> >> >> Von: ceph-users [ceph-users-bounces at lists.ceph.com]" im
>>> >> >> >> >> Auftrag von
>>> >> >> >> >> "Travis Rhoden [trhoden at gmail.com]
>>> >> >> >> >> Gesendet: Donnerstag, 10. Juli 2014 16:24
>>> >> >> >> >> An: Gregory Farnum
>>> >> >> >> >> Cc: ceph-users at lists.ceph.com
>>> >> >> >> >> Betreff: Re: [ceph-users] scrub error on firefly
>>> >> >> >> >>
>>> >> >> >> >> And actually just to follow-up, it does seem like there are
>>> >> >> >> >> some
>>> >> >> >> >> additional smarts beyond just using the primary to overwrite
>>> >> >> >> >> the
>>> >> >> >> >> secondaries...  Since I captured md5 sums before and after the
>>> >> >> >> >> repair, I can say that in this particular instance, the
>>> >> >> >> >> secondary
>>> >> >> >> >> copy
>>> >> >> >> >> was used to overwrite the primary.
>>> >> >> >> >> So, I'm just trusting Ceph to the right thing, and so far it
>>> >> >> >> >> seems
>>> >> >> >> >> to, but the comments here about needing to determine the
>>> >> >> >> >> correct
>>> >> >> >> >> object and place it on the primary PG make me wonder if I've
>>> >> >> >> >> been
>>> >> >> >> >> missing something.
>>> >> >> >> >>
>>> >> >> >> >>  - Travis
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> On Thu, Jul 10, 2014 at 10:19 AM, Travis Rhoden
>>> >> >> >> >> <trhoden at gmail.com>
>>> >> >> >> >> wrote:
>>> >> >> >> >>>
>>> >> >> >> >>> I can also say that after a recent upgrade to Firefly, I have
>>> >> >> >> >>> experienced massive uptick in scrub errors.  The cluster was
>>> >> >> >> >>> on
>>> >> >> >> >>> cuttlefish for about a year, and had maybe one or two scrub
>>> >> >> >> >>> errors.
>>> >> >> >> >>> After upgrading to Firefly, we've probably seen 3 to 4 dozen
>>> >> >> >> >>> in the
>>> >> >> >> >>> last month or so (was getting 2-3 a day for a few weeks until
>>> >> >> >> >>> the
>>> >> >> >> >>> whole cluster was rescrubbed, it seemed).
>>> >> >> >> >>>
>>> >> >> >> >>> What I cannot determine, however, is how to know which object
>>> >> >> >> >>> is
>>> >> >> >> >>> busted?
>>> >> >> >> >>> For example, just today I ran into a scrub error.  The object
>>> >> >> >> >>> has
>>> >> >> >> >>> two copies and is an 8MB piece of an RBD, and has identical
>>> >> >> >> >>> timestamps, identical xattrs names and values.  But it
>>> >> >> >> >>> definitely
>>> >> >> >> >>> has a different
>>> >> >> >> >>> MD5 sum. How to know which one is correct?
>>> >> >> >> >>>
>>> >> >> >> >>> I've been just kicking off pg repair each time, which seems
>>> >> >> >> >>> to just
>>> >> >> >> >>> use the primary copy to overwrite the others.  Haven't run
>>> >> >> >> >>> into any
>>> >> >> >> >>> issues with that so far, but it does make me nervous.
>>> >> >> >> >>>
>>> >> >> >> >>>  - Travis
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> On Tue, Jul 8, 2014 at 1:06 AM, Gregory Farnum
>>> >> >> >> >>> <greg at inktank.com>
>>> >> >> >> >>> wrote:
>>> >> >> >> >>>>
>>> >> >> >> >>>> It's not very intuitive or easy to look at right now (there
>>> >> >> >> >>>> are
>>> >> >> >> >>>> plans from the recent developer summit to improve things),
>>> >> >> >> >>>> but the
>>> >> >> >> >>>> central log should have output about exactly what objects
>>> >> >> >> >>>> are
>>> >> >> >> >>>> busted. You'll then want to compare the copies manually to
>>> >> >> >> >>>> determine which ones are good or bad, get the good copy on
>>> >> >> >> >>>> the
>>> >> >> >> >>>> primary (make sure you preserve xattrs), and run repair.
>>> >> >> >> >>>> -Greg
>>> >> >> >> >>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>> >> >> >> >>>>
>>> >> >> >> >>>>
>>> >> >> >> >>>> On Mon, Jul 7, 2014 at 6:48 PM, Randy Smith
>>> >> >> >> >>>> <rbsmith at adams.edu>
>>> >> >> >> >>>> wrote:
>>> >> >> >> >>>> > Greetings,
>>> >> >> >> >>>> >
>>> >> >> >> >>>> > I upgraded to firefly last week and I suddenly received
>>> >> >> >> >>>> > this
>>> >> >> >> >>>> > error:
>>> >> >> >> >>>> >
>>> >> >> >> >>>> > health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
>>> >> >> >> >>>> >
>>> >> >> >> >>>> > ceph health detail shows the following:
>>> >> >> >> >>>> >
>>> >> >> >> >>>> > HEALTH_ERR 1 pgs inconsistent; 1 scrub errors pg 3.c6 is
>>> >> >> >> >>>> > active+clean+inconsistent, acting [2,5]
>>> >> >> >> >>>> > 1 scrub errors
>>> >> >> >> >>>> >
>>> >> >> >> >>>> > The docs say that I can run `ceph pg repair 3.c6` to fix
>>> >> >> >> >>>> > this.
>>> >> >> >> >>>> > What I want to know is what are the risks of data loss if
>>> >> >> >> >>>> > I run
>>> >> >> >> >>>> > that command in this state and how can I mitigate them?
>>> >> >> >> >>>> >
>>> >> >> >> >>>> > --
>>> >> >> >> >>>> > Randall Smith
>>> >> >> >> >>>> > Computing Services
>>> >> >> >> >>>> > Adams State University
>>> >> >> >> >>>> > http://www.adams.edu/
>>> >> >> >> >>>> > 719-587-7741
>>> >> >> >> >>>> >
>>> >> >> >> >>>> > _______________________________________________
>>> >> >> >> >>>> > ceph-users mailing list
>>> >> >> >> >>>> > ceph-users at lists.ceph.com
>>> >> >> >> >>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >> >> >> >>>> >
>>> >> >> >> >>>> _______________________________________________
>>> >> >> >> >>>> ceph-users mailing list
>>> >> >> >> >>>> ceph-users at lists.ceph.com
>>> >> >> >> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> _______________________________________________
>>> >> >> >> >> ceph-users mailing list
>>> >> >> >> >> ceph-users at lists.ceph.com
>>> >> >> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >> >> >> >>
>>> >> >> >> > _______________________________________________
>>> >> >> >> > ceph-users mailing list
>>> >> >> >> > ceph-users at lists.ceph.com
>>> >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >> >> >> _______________________________________________
>>> >> >> >> ceph-users mailing list
>>> >> >> >> ceph-users at lists.ceph.com
>>> >> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > --
>>> >> >> > Randall Smith
>>> >> >> > Computing Services
>>> >> >> > Adams State University
>>> >> >> > http://www.adams.edu/
>>> >> >> > 719-587-7741
>>> >> >> >
>>> >> >> > _______________________________________________
>>> >> >> > ceph-users mailing list
>>> >> >> > ceph-users at lists.ceph.com
>>> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Randall Smith
>>> >> > Computing Services
>>> >> > Adams State University
>>> >> > http://www.adams.edu/
>>> >> > 719-587-7741
>>> >> _______________________________________________
>>> >> ceph-users mailing list
>>> >> ceph-users at lists.ceph.com
>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >>
>>> >>
>>
>>
>>
>>
>> --
>> Randall Smith
>> Computing Services
>> Adams State University
>> http://www.adams.edu/
>> 719-587-7741