No chance to have those logs and even less in debug mode. I do this change 3 weeks ago. I put all my log here if it's can help : https://blondeau.users.greyc.fr/cephlog/all/ I have a chance to recover my +/- 20TB of data ? Regards Le 03/07/2014 21:48, Joao Luis a ?crit : > Do those logs have a higher debugging level than the default? If not > nevermind as they will not have enough information. If they do however, > we'd be interested in the portion around the moment you set the > tunables. Say, before the upgrade and a bit after you set the tunable. > If you want to be finer grained, then ideally it would be the moment > where those maps were created, but you'd have to grep the logs for that. > > Or drop the logs somewhere and I'll take a look. > > -Joao > > On Jul 3, 2014 5:48 PM, "Pierre BLONDEAU" <pierre.blondeau at unicaen.fr > <mailto:pierre.blondeau at unicaen.fr>> wrote: > > Le 03/07/2014 13:49, Joao Eduardo Luis a ?crit : > > On 07/03/2014 12:15 AM, Pierre BLONDEAU wrote: > > Le 03/07/2014 00:55, Samuel Just a ?crit : > > Ah, > > ~/logs ? for i in 20 23; do ../ceph/src/osdmaptool > --export-crush > /tmp/crush$i osd-$i*; ../ceph/src/crushtool -d > /tmp/crush$i > > /tmp/crush$i.d; done; diff /tmp/crush20.d /tmp/crush23.d > ../ceph/src/osdmaptool: osdmap file > 'osd-20_osdmap.13258__0___4E62BB79__none' > ../ceph/src/osdmaptool: exported crush map to /tmp/crush20 > ../ceph/src/osdmaptool: osdmap file > 'osd-23_osdmap.13258__0___4E62BB79__none' > ../ceph/src/osdmaptool: exported crush map to /tmp/crush23 > 6d5 > < tunable chooseleaf_vary_r 1 > > Looks like the chooseleaf_vary_r tunable somehow ended > up divergent? > > > The only thing that comes to mind that could cause this is if we > changed > the leader's in-memory map, proposed it, it failed, and only the > leader > got to write the map to disk somehow. This happened once on a > totally > different issue (although I can't pinpoint right now which). > > In such a scenario, the leader would serve the incorrect osdmap to > whoever asked osdmaps from it, the remaining quorum would serve the > correct osdmaps to all the others. This could cause this > divergence. Or > it could be something else. > > Are there logs for the monitors for the timeframe this may have > happened > in? > > > Which exactly timeframe you want ? I have 7 days of logs, I should > have informations about the upgrade from firefly to 0.82. > Which mon's log do you want ? Three ? > > Regards > > -Joao > > > Pierre: do you recall how and when that got set? > > > I am not sure to understand, but if I good remember after > the update in > firefly, I was in state : HEALTH_WARN crush map has legacy > tunables and > I see "feature set mismatch" in log. > > So if I good remeber, i do : ceph osd crush tunables optimal > for the > problem of "crush map" and I update my client and server > kernel to > 3.16rc. > > It's could be that ? > > Pierre > > -Sam > > On Wed, Jul 2, 2014 at 3:43 PM, Samuel Just > <sam.just at inktank.com <mailto:sam.just at inktank.com>> > wrote: > > Yeah, divergent osdmaps: > 555ed048e73024687fc8b106a570db__4f > osd-20_osdmap.13258__0___4E62BB79__none > 6037911f31dc3c18b05499d24dcdbe__5c > osd-23_osdmap.13258__0___4E62BB79__none > > Joao: thoughts? > -Sam > > On Wed, Jul 2, 2014 at 3:39 PM, Pierre BLONDEAU > <pierre.blondeau at unicaen.fr > <mailto:pierre.blondeau at unicaen.fr>> wrote: > > The files > > When I upgrade : > ceph-deploy install --stable firefly servers... > on each servers service ceph restart mon > on each servers service ceph restart osd > on each servers service ceph restart mds > > I upgraded from emperor to firefly. After > repair, remap, replace, > etc ... I > have some PG which pass in peering state. > > I thought why not try the version 0.82, it could > solve my problem. ( > It's my mistake ). So, I upgrade from firefly to > 0.83 with : > ceph-deploy install --testing servers... > .. > > Now, all programs are in version 0.82. > I have 3 mons, 36 OSD and 3 mds. > > Pierre > > PS : I find also > "inc\uosdmap.13258__0___469271DE__none" on each meta > directory. > > Le 03/07/2014 00:10, Samuel Just a ?crit : > > Also, what version did you upgrade from, and > how did you upgrade? > -Sam > > On Wed, Jul 2, 2014 at 3:09 PM, Samuel Just > <sam.just at inktank.com > <mailto:sam.just at inktank.com>> > wrote: > > > Ok, in current/meta on osd 20 and osd > 23, please attach all files > matching > > ^osdmap.13258.* > > There should be one such file on each > osd. (should look something > like > osdmap.6__0_FD6E4C01__none, probably > hashed into a subdirectory, > you'll want to use find). > > What version of ceph is running on your > mons? How many mons do > you have? > -Sam > > On Wed, Jul 2, 2014 at 2:21 PM, Pierre > BLONDEAU > <pierre.blondeau at unicaen.fr > <mailto:pierre.blondeau at unicaen.fr>> wrote: > > > Hi, > > I do it, the log files are available > here : > https://blondeau.users.greyc.__fr/cephlog/debug20/ > <https://blondeau.users.greyc.fr/cephlog/debug20/> > > The OSD's files are really big +/- 80M . > > After starting the osd.20 some other > osd crash. I pass from 31 > osd up to > 16. > I remark that after this the number > of down+peering PG decrease > from 367 > to > 248. It's "normal" ? May be it's > temporary, the time that the > cluster > verifies all the PG ? > > Regards > Pierre > > Le 02/07/2014 19:16, Samuel Just a > ?crit : > > You should add > > debug osd = 20 > debug filestore = 20 > debug ms = 1 > > to the [osd] section of the > ceph.conf and restart the osds. I'd > like > all three logs if possible. > > Thanks > -Sam > > On Wed, Jul 2, 2014 at 5:03 AM, > Pierre BLONDEAU > <pierre.blondeau at unicaen.fr > <mailto:pierre.blondeau at unicaen.fr>> > wrote: > > > > Yes, but how i do that ? > > With a command like that ? > > ceph tell osd.20 injectargs > '--debug-osd 20 > --debug-filestore 20 > --debug-ms > 1' > > By modify the > /etc/ceph/ceph.conf ? This > file is really poor > because I > use > udev detection. > > When I have made these > changes, you want the three > log files or > only > osd.20's ? > > Thank you so much for the help > > Regards > Pierre > > Le 01/07/2014 23:51, Samuel > Just a ?crit : > > Can you reproduce with > debug osd = 20 > debug filestore = 20 > debug ms = 1 > ? > -Sam > > On Tue, Jul 1, 2014 at > 1:21 AM, Pierre BLONDEAU > <pierre.blondeau at unicaen.fr > <mailto:pierre.blondeau at unicaen.fr>> > wrote: > > > > > Hi, > > I join : > - osd.20 is > one of osd that I > detect which makes crash > other > OSD. > - osd.23 is > one of osd which > crash when i start > osd.20 > - mds, is one > of my MDS > > I cut log file > because they are to > big but. All is here : > https://blondeau.users.greyc.__fr/cephlog/ > <https://blondeau.users.greyc.fr/cephlog/> > > Regards > > Le 30/06/2014 17:35, > Gregory Farnum a ?crit : > > What's the > backtrace from > the crashing OSDs? > > Keep in mind > that as a dev > release, it's > generally best > not to > upgrade > to unnamed > versions like > 0.82 (but it's > probably too late > to go > back > now). > > > > > I will remember it > the next time ;) > > -Greg > Software > Engineer #42 @ > http://inktank.com > | http://ceph.com > > On Mon, Jun 30, > 2014 at 8:06 AM, > Pierre BLONDEAU > <pierre.blondeau at unicaen.fr > <mailto:pierre.blondeau at unicaen.fr>> > wrote: > > > > Hi, > > After the > upgrade to > firefly, I > have some PG > in peering > state. > I seen the > output of > 0.82 so I > try to > upgrade for > solved my > problem. > > My three MDS > crash and > some OSD > triggers a > chain reaction > that > kills > other > OSD. > I think my > MDS will not > start > because of > the metadata are > on the > OSD. > > I have 36 > OSD on three > servers and > I identified > 5 OSD which > makes > crash > others. If i > not start > their, the > cluster passe in > reconstructive > state > with > 31 OSD but i > have 378 in > down+peering > state. > > How can I do > ? Would you > more > information > ( os, crash log, > etc ... > ) > ? > > Regards > > > > > -- > ------------------------------__---------------- > Pierre BLONDEAU > Administrateur Syst?mes & r?seaux > Universit? de Caen > Laboratoire GREYC, D?partement > d'informatique > > tel : 02 31 56 75 42 > bureau : Campus 2, Science 3, 406 > ------------------------------__---------------- > > > > -- > ------------------------------__---------------- > Pierre BLONDEAU > Administrateur Syst?mes & r?seaux > Universit? de Caen > Laboratoire GREYC, D?partement d'informatique > > tel : 02 31 56 75 42 > bureau : Campus 2, Science 3, 406 > ------------------------------__---------------- > > > > > > > > -- > ------------------------------__---------------- > Pierre BLONDEAU > Administrateur Syst?mes & r?seaux > Universit? de Caen > Laboratoire GREYC, D?partement d'informatique > > tel : 02 31 56 75 42 > bureau : Campus 2, Science 3, 406 > ------------------------------__---------------- > > -- ---------------------------------------------- Pierre BLONDEAU Administrateur Syst?mes & r?seaux Universit? de Caen Laboratoire GREYC, D?partement d'informatique tel : 02 31 56 75 42 bureau : Campus 2, Science 3, 406 ---------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2947 bytes Desc: Signature cryptographique S/MIME URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140707/4ece8a71/attachment.bin>