And grab the xattrs as well. -Sam On Fri, Jul 11, 2014 at 2:39 PM, Samuel Just <sam.just at inktank.com> wrote: > Right. > -Sam > > On Fri, Jul 11, 2014 at 2:05 PM, Randy Smith <rbsmith at adams.edu> wrote: >> Greetings, >> >> I'm using xfs. >> >> Also, when, in a previous email, you asked if I could send the object, do >> you mean the files from each server named something like this: >> ./3.c6_head/DIR_6/DIR_C/DIR_5/rb.0.b0ce3.238e1f29.00000000000b__head_34DC35C6__3 >> ? >> >> >> On Fri, Jul 11, 2014 at 2:00 PM, Samuel Just <sam.just at inktank.com> wrote: >>> >>> Also, what filesystem are you using? >>> -Sam >>> >>> On Fri, Jul 11, 2014 at 10:37 AM, Sage Weil <sweil at redhat.com> wrote: >>> > One other thing we might also try is catching this earlier (on first >>> > read >>> > of corrupt data) instead of waiting for scrub. If you are not super >>> > performance sensitive, you can add >>> > >>> > filestore sloppy crc = true >>> > filestore sloppy crc block size = 524288 >>> > >>> > That will track and verify CRCs on any large (>512k) writes. Smaller >>> > block sizes will give more precision and more checks, but will generate >>> > larger xattrs and have a bigger impact on performance... >>> > >>> > sage >>> > >>> > >>> > On Fri, 11 Jul 2014, Samuel Just wrote: >>> > >>> >> When you get the next inconsistency, can you copy the actual objects >>> >> from the osd store trees and get them to us? That might provide a >>> >> clue. >>> >> -Sam >>> >> >>> >> On Fri, Jul 11, 2014 at 6:52 AM, Randy Smith <rbsmith at adams.edu> wrote: >>> >> > >>> >> > >>> >> > >>> >> > On Thu, Jul 10, 2014 at 4:40 PM, Samuel Just <sam.just at inktank.com> >>> >> > wrote: >>> >> >> >>> >> >> It could be an indication of a problem on osd 5, but the timing is >>> >> >> worrying. Can you attach your ceph.conf? >>> >> > >>> >> > >>> >> > Attached. >>> >> > >>> >> >> >>> >> >> Have there been any osds >>> >> >> going down, new osds added, anything to cause recovery? >>> >> > >>> >> > >>> >> > I upgraded to firefly last week. As part of the upgrade I, obviously, >>> >> > had to >>> >> > restart every osd. Also, I attempted to switch to the optimal >>> >> > tunables but >>> >> > doing so degraded 27% of my cluster and made most of my VMs >>> >> > unresponsive. I >>> >> > switched back to the legacy tunables and everything was happy again. >>> >> > Both of >>> >> > those operations, of course, caused recoveries. I have made no >>> >> > changes since >>> >> > then. >>> >> > >>> >> >> >>> >> >> Anything in >>> >> >> dmesg to indicate an fs problem? >>> >> > >>> >> > >>> >> > Nothing. The system went inconsistent again this morning, again on >>> >> > the same >>> >> > rbd but different osds this time. >>> >> > >>> >> > 2014-07-11 05:48:12.857657 osd.1 192.168.253.77:6801/12608 904 : >>> >> > [ERR] 3.76 >>> >> > shard 1: soid 1280076/rb.0.b0ce3.238e1f29.00000000025c/head//3 digest >>> >> > 2198242284 != known digest 3879754377 >>> >> > 2014-07-11 05:49:29.020024 osd.1 192.168.253.77:6801/12608 905 : >>> >> > [ERR] 3.76 >>> >> > deep-scrub 0 missing, 1 inconsistent objects >>> >> > 2014-07-11 05:49:29.020029 osd.1 192.168.253.77:6801/12608 906 : >>> >> > [ERR] 3.76 >>> >> > deep-scrub 1 errors >>> >> > >>> >> > $ ceph health detail >>> >> > HEALTH_ERR 1 pgs inconsistent; 1 scrub errors >>> >> > pg 3.76 is active+clean+inconsistent, acting [1,2] >>> >> > 1 scrub errors >>> >> > >>> >> > >>> >> >> >>> >> >> Have you recently changed any >>> >> >> settings? >>> >> > >>> >> > >>> >> > I upgraded from bobtail to dumpling to firefly. >>> >> > >>> >> >> >>> >> >> -Sam >>> >> >> >>> >> >> On Thu, Jul 10, 2014 at 2:58 PM, Randy Smith <rbsmith at adams.edu> >>> >> >> wrote: >>> >> >> > Greetings, >>> >> >> > >>> >> >> > Just a follow up on my original issue. =ceph pg repair ...= fixed >>> >> >> > the >>> >> >> > problem. However, today I got another inconsistent pg. It's >>> >> >> > interesting >>> >> >> > to >>> >> >> > me that this second error is in the same rbd image and appears to >>> >> >> > be >>> >> >> > "close" >>> >> >> > to the previously inconsistent pg. (Even more fun, osd.5 was the >>> >> >> > secondary >>> >> >> > in the first error and is the primary here though the other osd is >>> >> >> > different.) >>> >> >> > >>> >> >> > Is this indicative of a problem on osd.5 or perhaps a clue into >>> >> >> > what's >>> >> >> > causing firefly to be so inconsistent? >>> >> >> > >>> >> >> > The relevant log entries are below. >>> >> >> > >>> >> >> > 2014-07-07 18:50:48.646407 osd.2 192.168.253.70:6801/56987 163 : >>> >> >> > [ERR] >>> >> >> > 3.c6 >>> >> >> > shard 2: soid 34dc35c6/rb.0.b0ce3.238e1f29.00000000000b/head//3 >>> >> >> > digest >>> >> >> > 2256074002 != known digest 3998068918 >>> >> >> > 2014-07-07 18:51:36.936076 osd.2 192.168.253.70:6801/56987 164 : >>> >> >> > [ERR] >>> >> >> > 3.c6 >>> >> >> > deep-scrub 0 missing, 1 inconsistent objects >>> >> >> > 2014-07-07 18:51:36.936082 osd.2 192.168.253.70:6801/56987 165 : >>> >> >> > [ERR] >>> >> >> > 3.c6 >>> >> >> > deep-scrub 1 errors >>> >> >> > >>> >> >> > >>> >> >> > 2014-07-10 15:38:53.990328 osd.5 192.168.253.81:6800/10013 257 : >>> >> >> > [ERR] >>> >> >> > 3.41 >>> >> >> > shard 1: soid e183cc41/rb.0.b0ce3.238e1f29.00000000024c/head//3 >>> >> >> > digest >>> >> >> > 3224286363 != known digest 3409342281 >>> >> >> > 2014-07-10 15:39:11.701276 osd.5 192.168.253.81:6800/10013 258 : >>> >> >> > [ERR] >>> >> >> > 3.41 >>> >> >> > deep-scrub 0 missing, 1 inconsistent objects >>> >> >> > 2014-07-10 15:39:11.701281 osd.5 192.168.253.81:6800/10013 259 : >>> >> >> > [ERR] >>> >> >> > 3.41 >>> >> >> > deep-scrub 1 errors >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > On Thu, Jul 10, 2014 at 12:05 PM, Chahal, Sudip >>> >> >> > <sudip.chahal at intel.com> >>> >> >> > wrote: >>> >> >> >> >>> >> >> >> Thanks - so it appears that the advantage of the 3rd replica >>> >> >> >> (relative >>> >> >> >> to >>> >> >> >> 2 replicas) has to do much more with recovering from two >>> >> >> >> concurrent OSD >>> >> >> >> failures than with inconsistencies found during deep scrub - >>> >> >> >> would you >>> >> >> >> agree? >>> >> >> >> >>> >> >> >> Re: repair - do you mean the "repair" process during deep scrub >>> >> >> >> - if >>> >> >> >> yes, >>> >> >> >> this is automatic - correct? >>> >> >> >> Or >>> >> >> >> Are you referring to the explicit manually initiated repair >>> >> >> >> commands? >>> >> >> >> >>> >> >> >> Thanks, >>> >> >> >> >>> >> >> >> -Sudip >>> >> >> >> >>> >> >> >> -----Original Message----- >>> >> >> >> From: Samuel Just [mailto:sam.just at inktank.com] >>> >> >> >> Sent: Thursday, July 10, 2014 10:50 AM >>> >> >> >> To: Chahal, Sudip >>> >> >> >> Cc: Christian Eichelmann; ceph-users at lists.ceph.com >>> >> >> >> Subject: Re: [ceph-users] scrub error on firefly >>> >> >> >> >>> >> >> >> Repair I think will tend to choose the copy with the lowest osd >>> >> >> >> number >>> >> >> >> which is not obviously corrupted. Even with three replicas, it >>> >> >> >> does >>> >> >> >> not do >>> >> >> >> any kind of voting at this time. >>> >> >> >> -Sam >>> >> >> >> >>> >> >> >> On Thu, Jul 10, 2014 at 10:39 AM, Chahal, Sudip >>> >> >> >> <sudip.chahal at intel.com> >>> >> >> >> wrote: >>> >> >> >> > I've a basic related question re: Firefly operation - would >>> >> >> >> > appreciate >>> >> >> >> > any insights: >>> >> >> >> > >>> >> >> >> > With three replicas, if checksum inconsistencies across >>> >> >> >> > replicas are >>> >> >> >> > found during deep-scrub then: >>> >> >> >> > a. does the majority win or is the primary always the >>> >> >> >> > winner >>> >> >> >> > and used to overwrite the secondaries >>> >> >> >> > b. is this reconciliation done automatically >>> >> >> >> > during >>> >> >> >> > deep-scrub or does each reconciliation have to be executed >>> >> >> >> > manually >>> >> >> >> > by the >>> >> >> >> > administrator? >>> >> >> >> > >>> >> >> >> > With 2 replicas - how are things different (if at all): >>> >> >> >> > a. The primary is declared the winner - correct? >>> >> >> >> > b. is this reconciliation done automatically >>> >> >> >> > during >>> >> >> >> > deep-scrub or does it have to be done "manually" because there >>> >> >> >> > is no >>> >> >> >> > majority? >>> >> >> >> > >>> >> >> >> > Thanks, >>> >> >> >> > >>> >> >> >> > -Sudip >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > -----Original Message----- >>> >> >> >> > From: ceph-users [mailto:ceph-users-bounces at lists.ceph.com] On >>> >> >> >> > Behalf >>> >> >> >> > Of Samuel Just >>> >> >> >> > Sent: Thursday, July 10, 2014 10:16 AM >>> >> >> >> > To: Christian Eichelmann >>> >> >> >> > Cc: ceph-users at lists.ceph.com >>> >> >> >> > Subject: Re: [ceph-users] scrub error on firefly >>> >> >> >> > >>> >> >> >> > Can you attach your ceph.conf for your osds? >>> >> >> >> > -Sam >>> >> >> >> > >>> >> >> >> > On Thu, Jul 10, 2014 at 8:01 AM, Christian Eichelmann >>> >> >> >> > <christian.eichelmann at 1und1.de> wrote: >>> >> >> >> >> I can also confirm that after upgrading to firefly both of our >>> >> >> >> >> clusters (test and live) were going from 0 scrub errors each >>> >> >> >> >> for >>> >> >> >> >> about >>> >> >> >> >> 6 Month to about 9-12 per week... >>> >> >> >> >> This also makes me kind of nervous, since as far as I know >>> >> >> >> >> everything >>> >> >> >> >> "ceph pg repair" does, is to copy the primary object to all >>> >> >> >> >> replicas, >>> >> >> >> >> no matter which object is the correct one. >>> >> >> >> >> Of course the described method of manual checking works (for >>> >> >> >> >> pools >>> >> >> >> >> with more than 2 replicas), but doing this in a large cluster >>> >> >> >> >> nearly >>> >> >> >> >> every week is horribly timeconsuming and error prone. >>> >> >> >> >> It would be great to get an explanation for the increased >>> >> >> >> >> numbers of >>> >> >> >> >> scrub errors since firefly. Were they just not detected >>> >> >> >> >> correctly in >>> >> >> >> >> previous versions? Or is there maybe something wrong with the >>> >> >> >> >> new >>> >> >> >> >> code? >>> >> >> >> >> >>> >> >> >> >> Acutally, our company is currently preventing our projects to >>> >> >> >> >> move >>> >> >> >> >> to >>> >> >> >> >> ceph because of this problem. >>> >> >> >> >> >>> >> >> >> >> Regards, >>> >> >> >> >> Christian >>> >> >> >> >> ________________________________ >>> >> >> >> >> Von: ceph-users [ceph-users-bounces at lists.ceph.com]" im >>> >> >> >> >> Auftrag von >>> >> >> >> >> "Travis Rhoden [trhoden at gmail.com] >>> >> >> >> >> Gesendet: Donnerstag, 10. Juli 2014 16:24 >>> >> >> >> >> An: Gregory Farnum >>> >> >> >> >> Cc: ceph-users at lists.ceph.com >>> >> >> >> >> Betreff: Re: [ceph-users] scrub error on firefly >>> >> >> >> >> >>> >> >> >> >> And actually just to follow-up, it does seem like there are >>> >> >> >> >> some >>> >> >> >> >> additional smarts beyond just using the primary to overwrite >>> >> >> >> >> the >>> >> >> >> >> secondaries... Since I captured md5 sums before and after the >>> >> >> >> >> repair, I can say that in this particular instance, the >>> >> >> >> >> secondary >>> >> >> >> >> copy >>> >> >> >> >> was used to overwrite the primary. >>> >> >> >> >> So, I'm just trusting Ceph to the right thing, and so far it >>> >> >> >> >> seems >>> >> >> >> >> to, but the comments here about needing to determine the >>> >> >> >> >> correct >>> >> >> >> >> object and place it on the primary PG make me wonder if I've >>> >> >> >> >> been >>> >> >> >> >> missing something. >>> >> >> >> >> >>> >> >> >> >> - Travis >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> On Thu, Jul 10, 2014 at 10:19 AM, Travis Rhoden >>> >> >> >> >> <trhoden at gmail.com> >>> >> >> >> >> wrote: >>> >> >> >> >>> >>> >> >> >> >>> I can also say that after a recent upgrade to Firefly, I have >>> >> >> >> >>> experienced massive uptick in scrub errors. The cluster was >>> >> >> >> >>> on >>> >> >> >> >>> cuttlefish for about a year, and had maybe one or two scrub >>> >> >> >> >>> errors. >>> >> >> >> >>> After upgrading to Firefly, we've probably seen 3 to 4 dozen >>> >> >> >> >>> in the >>> >> >> >> >>> last month or so (was getting 2-3 a day for a few weeks until >>> >> >> >> >>> the >>> >> >> >> >>> whole cluster was rescrubbed, it seemed). >>> >> >> >> >>> >>> >> >> >> >>> What I cannot determine, however, is how to know which object >>> >> >> >> >>> is >>> >> >> >> >>> busted? >>> >> >> >> >>> For example, just today I ran into a scrub error. The object >>> >> >> >> >>> has >>> >> >> >> >>> two copies and is an 8MB piece of an RBD, and has identical >>> >> >> >> >>> timestamps, identical xattrs names and values. But it >>> >> >> >> >>> definitely >>> >> >> >> >>> has a different >>> >> >> >> >>> MD5 sum. How to know which one is correct? >>> >> >> >> >>> >>> >> >> >> >>> I've been just kicking off pg repair each time, which seems >>> >> >> >> >>> to just >>> >> >> >> >>> use the primary copy to overwrite the others. Haven't run >>> >> >> >> >>> into any >>> >> >> >> >>> issues with that so far, but it does make me nervous. >>> >> >> >> >>> >>> >> >> >> >>> - Travis >>> >> >> >> >>> >>> >> >> >> >>> >>> >> >> >> >>> On Tue, Jul 8, 2014 at 1:06 AM, Gregory Farnum >>> >> >> >> >>> <greg at inktank.com> >>> >> >> >> >>> wrote: >>> >> >> >> >>>> >>> >> >> >> >>>> It's not very intuitive or easy to look at right now (there >>> >> >> >> >>>> are >>> >> >> >> >>>> plans from the recent developer summit to improve things), >>> >> >> >> >>>> but the >>> >> >> >> >>>> central log should have output about exactly what objects >>> >> >> >> >>>> are >>> >> >> >> >>>> busted. You'll then want to compare the copies manually to >>> >> >> >> >>>> determine which ones are good or bad, get the good copy on >>> >> >> >> >>>> the >>> >> >> >> >>>> primary (make sure you preserve xattrs), and run repair. >>> >> >> >> >>>> -Greg >>> >> >> >> >>>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>> >> >> >> >>>> >>> >> >> >> >>>> >>> >> >> >> >>>> On Mon, Jul 7, 2014 at 6:48 PM, Randy Smith >>> >> >> >> >>>> <rbsmith at adams.edu> >>> >> >> >> >>>> wrote: >>> >> >> >> >>>> > Greetings, >>> >> >> >> >>>> > >>> >> >> >> >>>> > I upgraded to firefly last week and I suddenly received >>> >> >> >> >>>> > this >>> >> >> >> >>>> > error: >>> >> >> >> >>>> > >>> >> >> >> >>>> > health HEALTH_ERR 1 pgs inconsistent; 1 scrub errors >>> >> >> >> >>>> > >>> >> >> >> >>>> > ceph health detail shows the following: >>> >> >> >> >>>> > >>> >> >> >> >>>> > HEALTH_ERR 1 pgs inconsistent; 1 scrub errors pg 3.c6 is >>> >> >> >> >>>> > active+clean+inconsistent, acting [2,5] >>> >> >> >> >>>> > 1 scrub errors >>> >> >> >> >>>> > >>> >> >> >> >>>> > The docs say that I can run `ceph pg repair 3.c6` to fix >>> >> >> >> >>>> > this. >>> >> >> >> >>>> > What I want to know is what are the risks of data loss if >>> >> >> >> >>>> > I run >>> >> >> >> >>>> > that command in this state and how can I mitigate them? >>> >> >> >> >>>> > >>> >> >> >> >>>> > -- >>> >> >> >> >>>> > Randall Smith >>> >> >> >> >>>> > Computing Services >>> >> >> >> >>>> > Adams State University >>> >> >> >> >>>> > http://www.adams.edu/ >>> >> >> >> >>>> > 719-587-7741 >>> >> >> >> >>>> > >>> >> >> >> >>>> > _______________________________________________ >>> >> >> >> >>>> > ceph-users mailing list >>> >> >> >> >>>> > ceph-users at lists.ceph.com >>> >> >> >> >>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> >> >>>> > >>> >> >> >> >>>> _______________________________________________ >>> >> >> >> >>>> ceph-users mailing list >>> >> >> >> >>>> ceph-users at lists.ceph.com >>> >> >> >> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> >> >>> >>> >> >> >> >>> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> _______________________________________________ >>> >> >> >> >> ceph-users mailing list >>> >> >> >> >> ceph-users at lists.ceph.com >>> >> >> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> >> >> >>> >> >> >> > _______________________________________________ >>> >> >> >> > ceph-users mailing list >>> >> >> >> > ceph-users at lists.ceph.com >>> >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> >> _______________________________________________ >>> >> >> >> ceph-users mailing list >>> >> >> >> ceph-users at lists.ceph.com >>> >> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > -- >>> >> >> > Randall Smith >>> >> >> > Computing Services >>> >> >> > Adams State University >>> >> >> > http://www.adams.edu/ >>> >> >> > 719-587-7741 >>> >> >> > >>> >> >> > _______________________________________________ >>> >> >> > ceph-users mailing list >>> >> >> > ceph-users at lists.ceph.com >>> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > -- >>> >> > Randall Smith >>> >> > Computing Services >>> >> > Adams State University >>> >> > http://www.adams.edu/ >>> >> > 719-587-7741 >>> >> _______________________________________________ >>> >> ceph-users mailing list >>> >> ceph-users at lists.ceph.com >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >>> >> >> >> >> >> >> -- >> Randall Smith >> Computing Services >> Adams State University >> http://www.adams.edu/ >> 719-587-7741