On 07/09/2014 02:22 PM, Pierre BLONDEAU wrote: > Hi, > > There is any chance to restore my data ? Okay, I talked to Sam and here's what you could try before anything else: - Make sure you have everything running on the same version. - unset the the chooseleaf_vary_r flag -- this can be accomplished by setting tunables to legacy. - have the osds join in the cluster - you should then either upgrade to firefly (if you haven't done so by now) or wait for the point-release before you move on to setting tunables to optimal again. Let us know how it goes. -Joao > > Regards > Pierre > > Le 07/07/2014 15:42, Pierre BLONDEAU a ?crit : >> No chance to have those logs and even less in debug mode. I do this >> change 3 weeks ago. >> >> I put all my log here if it's can help : >> https://blondeau.users.greyc.fr/cephlog/all/ >> >> I have a chance to recover my +/- 20TB of data ? >> >> Regards >> >> Le 03/07/2014 21:48, Joao Luis a ?crit : >>> Do those logs have a higher debugging level than the default? If not >>> nevermind as they will not have enough information. If they do however, >>> we'd be interested in the portion around the moment you set the >>> tunables. Say, before the upgrade and a bit after you set the tunable. >>> If you want to be finer grained, then ideally it would be the moment >>> where those maps were created, but you'd have to grep the logs for that. >>> >>> Or drop the logs somewhere and I'll take a look. >>> >>> -Joao >>> >>> On Jul 3, 2014 5:48 PM, "Pierre BLONDEAU" <pierre.blondeau at unicaen.fr >>> <mailto:pierre.blondeau at unicaen.fr>> wrote: >>> >>> Le 03/07/2014 13:49, Joao Eduardo Luis a ?crit : >>> >>> On 07/03/2014 12:15 AM, Pierre BLONDEAU wrote: >>> >>> Le 03/07/2014 00:55, Samuel Just a ?crit : >>> >>> Ah, >>> >>> ~/logs ? for i in 20 23; do ../ceph/src/osdmaptool >>> --export-crush >>> /tmp/crush$i osd-$i*; ../ceph/src/crushtool -d >>> /tmp/crush$i > >>> /tmp/crush$i.d; done; diff /tmp/crush20.d /tmp/crush23.d >>> ../ceph/src/osdmaptool: osdmap file >>> 'osd-20_osdmap.13258__0___4E62BB79__none' >>> ../ceph/src/osdmaptool: exported crush map to >>> /tmp/crush20 >>> ../ceph/src/osdmaptool: osdmap file >>> 'osd-23_osdmap.13258__0___4E62BB79__none' >>> ../ceph/src/osdmaptool: exported crush map to >>> /tmp/crush23 >>> 6d5 >>> < tunable chooseleaf_vary_r 1 >>> >>> Looks like the chooseleaf_vary_r tunable somehow ended >>> up divergent? >>> >>> >>> The only thing that comes to mind that could cause this is if we >>> changed >>> the leader's in-memory map, proposed it, it failed, and only the >>> leader >>> got to write the map to disk somehow. This happened once on a >>> totally >>> different issue (although I can't pinpoint right now which). >>> >>> In such a scenario, the leader would serve the incorrect >>> osdmap to >>> whoever asked osdmaps from it, the remaining quorum would >>> serve the >>> correct osdmaps to all the others. This could cause this >>> divergence. Or >>> it could be something else. >>> >>> Are there logs for the monitors for the timeframe this may have >>> happened >>> in? >>> >>> >>> Which exactly timeframe you want ? I have 7 days of logs, I should >>> have informations about the upgrade from firefly to 0.82. >>> Which mon's log do you want ? Three ? >>> >>> Regards >>> >>> -Joao >>> >>> >>> Pierre: do you recall how and when that got set? >>> >>> >>> I am not sure to understand, but if I good remember after >>> the update in >>> firefly, I was in state : HEALTH_WARN crush map has legacy >>> tunables and >>> I see "feature set mismatch" in log. >>> >>> So if I good remeber, i do : ceph osd crush tunables optimal >>> for the >>> problem of "crush map" and I update my client and server >>> kernel to >>> 3.16rc. >>> >>> It's could be that ? >>> >>> Pierre >>> >>> -Sam >>> >>> On Wed, Jul 2, 2014 at 3:43 PM, Samuel Just >>> <sam.just at inktank.com <mailto:sam.just at inktank.com>> >>> wrote: >>> >>> Yeah, divergent osdmaps: >>> 555ed048e73024687fc8b106a570db__4f >>> osd-20_osdmap.13258__0___4E62BB79__none >>> 6037911f31dc3c18b05499d24dcdbe__5c >>> osd-23_osdmap.13258__0___4E62BB79__none >>> >>> Joao: thoughts? >>> -Sam >>> >>> On Wed, Jul 2, 2014 at 3:39 PM, Pierre BLONDEAU >>> <pierre.blondeau at unicaen.fr >>> <mailto:pierre.blondeau at unicaen.fr>> wrote: >>> >>> The files >>> >>> When I upgrade : >>> ceph-deploy install --stable firefly >>> servers... >>> on each servers service ceph restart mon >>> on each servers service ceph restart osd >>> on each servers service ceph restart mds >>> >>> I upgraded from emperor to firefly. After >>> repair, remap, replace, >>> etc ... I >>> have some PG which pass in peering state. >>> >>> I thought why not try the version 0.82, it could >>> solve my problem. ( >>> It's my mistake ). So, I upgrade from firefly to >>> 0.83 with : >>> ceph-deploy install --testing servers... >>> .. >>> >>> Now, all programs are in version 0.82. >>> I have 3 mons, 36 OSD and 3 mds. >>> >>> Pierre >>> >>> PS : I find also >>> "inc\uosdmap.13258__0___469271DE__none" on >>> each meta >>> directory. >>> >>> Le 03/07/2014 00:10, Samuel Just a ?crit : >>> >>> Also, what version did you upgrade from, and >>> how did you upgrade? >>> -Sam >>> >>> On Wed, Jul 2, 2014 at 3:09 PM, Samuel Just >>> <sam.just at inktank.com >>> <mailto:sam.just at inktank.com>> >>> wrote: >>> >>> >>> Ok, in current/meta on osd 20 and osd >>> 23, please attach all files >>> matching >>> >>> ^osdmap.13258.* >>> >>> There should be one such file on each >>> osd. (should look something >>> like >>> osdmap.6__0_FD6E4C01__none, probably >>> hashed into a subdirectory, >>> you'll want to use find). >>> >>> What version of ceph is running on your >>> mons? How many mons do >>> you have? >>> -Sam >>> >>> On Wed, Jul 2, 2014 at 2:21 PM, Pierre >>> BLONDEAU >>> <pierre.blondeau at unicaen.fr >>> <mailto:pierre.blondeau at unicaen.fr>> >>> wrote: >>> >>> >>> Hi, >>> >>> I do it, the log files are available >>> here : >>> >>> https://blondeau.users.greyc.__fr/cephlog/debug20/ >>> >>> <https://blondeau.users.greyc.fr/cephlog/debug20/> >>> >>> The OSD's files are really big +/- >>> 80M . >>> >>> After starting the osd.20 some other >>> osd crash. I pass from 31 >>> osd up to >>> 16. >>> I remark that after this the number >>> of down+peering PG decrease >>> from 367 >>> to >>> 248. It's "normal" ? May be it's >>> temporary, the time that the >>> cluster >>> verifies all the PG ? >>> >>> Regards >>> Pierre >>> >>> Le 02/07/2014 19:16, Samuel Just a >>> ?crit : >>> >>> You should add >>> >>> debug osd = 20 >>> debug filestore = 20 >>> debug ms = 1 >>> >>> to the [osd] section of the >>> ceph.conf and restart the >>> osds. I'd >>> like >>> all three logs if possible. >>> >>> Thanks >>> -Sam >>> >>> On Wed, Jul 2, 2014 at 5:03 AM, >>> Pierre BLONDEAU >>> <pierre.blondeau at unicaen.fr >>> >>> <mailto:pierre.blondeau at unicaen.fr>> >>> wrote: >>> >>> >>> >>> Yes, but how i do that ? >>> >>> With a command like that ? >>> >>> ceph tell osd.20 injectargs >>> '--debug-osd 20 >>> --debug-filestore 20 >>> --debug-ms >>> 1' >>> >>> By modify the >>> /etc/ceph/ceph.conf ? This >>> file is really poor >>> because I >>> use >>> udev detection. >>> >>> When I have made these >>> changes, you want the three >>> log files or >>> only >>> osd.20's ? >>> >>> Thank you so much for the >>> help >>> >>> Regards >>> Pierre >>> >>> Le 01/07/2014 23:51, Samuel >>> Just a ?crit : >>> >>> Can you reproduce with >>> debug osd = 20 >>> debug filestore = 20 >>> debug ms = 1 >>> ? >>> -Sam >>> >>> On Tue, Jul 1, 2014 at >>> 1:21 AM, Pierre BLONDEAU >>> >>> <pierre.blondeau at unicaen.fr >>> >>> <mailto:pierre.blondeau at unicaen.fr>> >>> wrote: >>> >>> >>> >>> >>> Hi, >>> >>> I join : >>> - osd.20 is >>> one of osd that I >>> detect which makes >>> crash >>> other >>> OSD. >>> - osd.23 is >>> one of osd which >>> crash when i start >>> osd.20 >>> - mds, is one >>> of my MDS >>> >>> I cut log file >>> because they are to >>> big but. All is >>> here : >>> >>> https://blondeau.users.greyc.__fr/cephlog/ >>> >>> <https://blondeau.users.greyc.fr/cephlog/> >>> >>> Regards >>> >>> Le 30/06/2014 17:35, >>> Gregory Farnum a >>> ?crit : >>> >>> What's the >>> backtrace from >>> the crashing >>> OSDs? >>> >>> Keep in mind >>> that as a dev >>> release, it's >>> generally best >>> not to >>> upgrade >>> to unnamed >>> versions like >>> 0.82 (but it's >>> probably too >>> late >>> to go >>> back >>> now). >>> >>> >>> >>> >>> I will remember it >>> the next time ;) >>> >>> -Greg >>> Software >>> Engineer #42 @ >>> >>> http://inktank.com >>> | >>> http://ceph.com >>> >>> On Mon, Jun 30, >>> 2014 at 8:06 AM, >>> Pierre BLONDEAU >>> >>> <pierre.blondeau at unicaen.fr >>> >>> <mailto:pierre.blondeau at unicaen.fr>> >>> wrote: >>> >>> >>> >>> Hi, >>> >>> After the >>> upgrade to >>> firefly, I >>> have some PG >>> in peering >>> state. >>> I seen the >>> output of >>> 0.82 so I >>> try to >>> upgrade for >>> solved my >>> problem. >>> >>> My three MDS >>> crash and >>> some OSD >>> triggers a >>> chain >>> reaction >>> that >>> kills >>> other >>> OSD. >>> I think my >>> MDS will not >>> start >>> because of >>> the >>> metadata are >>> on the >>> OSD. >>> >>> I have 36 >>> OSD on three >>> servers and >>> I identified >>> 5 OSD which >>> makes >>> crash >>> others. If i >>> not start >>> their, the >>> cluster >>> passe in >>> >>> reconstructive >>> state >>> with >>> 31 OSD but i >>> have 378 in >>> down+peering >>> state. >>> >>> How can I do >>> ? Would you >>> more >>> information >>> ( os, >>> crash log, >>> etc ... >>> ) >>> ? >>> >>> Regards >>> >>> >>> >>> >>> -- >>> >>> ------------------------------__---------------- >>> Pierre BLONDEAU >>> Administrateur Syst?mes & r?seaux >>> Universit? de Caen >>> Laboratoire GREYC, D?partement >>> d'informatique >>> >>> tel : 02 31 56 75 42 >>> bureau : Campus 2, Science 3, 406 >>> >>> ------------------------------__---------------- >>> >>> >>> >>> -- >>> ------------------------------__---------------- >>> Pierre BLONDEAU >>> Administrateur Syst?mes & r?seaux >>> Universit? de Caen >>> Laboratoire GREYC, D?partement d'informatique >>> >>> tel : 02 31 56 75 42 >>> bureau : Campus 2, Science 3, 406 >>> ------------------------------__---------------- >>> >>> >>> >>> >>> >>> >>> >>> -- >>> ------------------------------__---------------- >>> Pierre BLONDEAU >>> Administrateur Syst?mes & r?seaux >>> Universit? de Caen >>> Laboratoire GREYC, D?partement d'informatique >>> >>> tel : 02 31 56 75 42 >>> bureau : Campus 2, Science 3, 406 >>> ------------------------------__---------------- >>> >>> >> >> >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users at lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > -- Joao Eduardo Luis Software Engineer | http://inktank.com | http://ceph.com