Ah, ~/logs ? for i in 20 23; do ../ceph/src/osdmaptool --export-crush /tmp/crush$i osd-$i*; ../ceph/src/crushtool -d /tmp/crush$i > /tmp/crush$i.d; done; diff /tmp/crush20.d /tmp/crush23.d ../ceph/src/osdmaptool: osdmap file 'osd-20_osdmap.13258__0_4E62BB79__none' ../ceph/src/osdmaptool: exported crush map to /tmp/crush20 ../ceph/src/osdmaptool: osdmap file 'osd-23_osdmap.13258__0_4E62BB79__none' ../ceph/src/osdmaptool: exported crush map to /tmp/crush23 6d5 < tunable chooseleaf_vary_r 1 Looks like the chooseleaf_vary_r tunable somehow ended up divergent? Pierre: do you recall how and when that got set? -Sam On Wed, Jul 2, 2014 at 3:43 PM, Samuel Just <sam.just at inktank.com> wrote: > Yeah, divergent osdmaps: > 555ed048e73024687fc8b106a570db4f osd-20_osdmap.13258__0_4E62BB79__none > 6037911f31dc3c18b05499d24dcdbe5c osd-23_osdmap.13258__0_4E62BB79__none > > Joao: thoughts? > -Sam > > On Wed, Jul 2, 2014 at 3:39 PM, Pierre BLONDEAU > <pierre.blondeau at unicaen.fr> wrote: >> The files >> >> When I upgrade : >> ceph-deploy install --stable firefly servers... >> on each servers service ceph restart mon >> on each servers service ceph restart osd >> on each servers service ceph restart mds >> >> I upgraded from emperor to firefly. After repair, remap, replace, etc ... I >> have some PG which pass in peering state. >> >> I thought why not try the version 0.82, it could solve my problem. ( >> It's my mistake ). So, I upgrade from firefly to 0.83 with : >> ceph-deploy install --testing servers... >> .. >> >> Now, all programs are in version 0.82. >> I have 3 mons, 36 OSD and 3 mds. >> >> Pierre >> >> PS : I find also "inc\uosdmap.13258__0_469271DE__none" on each meta >> directory. >> >> Le 03/07/2014 00:10, Samuel Just a ?crit : >> >>> Also, what version did you upgrade from, and how did you upgrade? >>> -Sam >>> >>> On Wed, Jul 2, 2014 at 3:09 PM, Samuel Just <sam.just at inktank.com> wrote: >>>> >>>> Ok, in current/meta on osd 20 and osd 23, please attach all files >>>> matching >>>> >>>> ^osdmap.13258.* >>>> >>>> There should be one such file on each osd. (should look something like >>>> osdmap.6__0_FD6E4C01__none, probably hashed into a subdirectory, >>>> you'll want to use find). >>>> >>>> What version of ceph is running on your mons? How many mons do you have? >>>> -Sam >>>> >>>> On Wed, Jul 2, 2014 at 2:21 PM, Pierre BLONDEAU >>>> <pierre.blondeau at unicaen.fr> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I do it, the log files are available here : >>>>> https://blondeau.users.greyc.fr/cephlog/debug20/ >>>>> >>>>> The OSD's files are really big +/- 80M . >>>>> >>>>> After starting the osd.20 some other osd crash. I pass from 31 osd up to >>>>> 16. >>>>> I remark that after this the number of down+peering PG decrease from 367 >>>>> to >>>>> 248. It's "normal" ? May be it's temporary, the time that the cluster >>>>> verifies all the PG ? >>>>> >>>>> Regards >>>>> Pierre >>>>> >>>>> Le 02/07/2014 19:16, Samuel Just a ?crit : >>>>> >>>>>> You should add >>>>>> >>>>>> debug osd = 20 >>>>>> debug filestore = 20 >>>>>> debug ms = 1 >>>>>> >>>>>> to the [osd] section of the ceph.conf and restart the osds. I'd like >>>>>> all three logs if possible. >>>>>> >>>>>> Thanks >>>>>> -Sam >>>>>> >>>>>> On Wed, Jul 2, 2014 at 5:03 AM, Pierre BLONDEAU >>>>>> <pierre.blondeau at unicaen.fr> wrote: >>>>>>> >>>>>>> >>>>>>> Yes, but how i do that ? >>>>>>> >>>>>>> With a command like that ? >>>>>>> >>>>>>> ceph tell osd.20 injectargs '--debug-osd 20 --debug-filestore 20 >>>>>>> --debug-ms >>>>>>> 1' >>>>>>> >>>>>>> By modify the /etc/ceph/ceph.conf ? This file is really poor because I >>>>>>> use >>>>>>> udev detection. >>>>>>> >>>>>>> When I have made these changes, you want the three log files or only >>>>>>> osd.20's ? >>>>>>> >>>>>>> Thank you so much for the help >>>>>>> >>>>>>> Regards >>>>>>> Pierre >>>>>>> >>>>>>> Le 01/07/2014 23:51, Samuel Just a ?crit : >>>>>>> >>>>>>>> Can you reproduce with >>>>>>>> debug osd = 20 >>>>>>>> debug filestore = 20 >>>>>>>> debug ms = 1 >>>>>>>> ? >>>>>>>> -Sam >>>>>>>> >>>>>>>> On Tue, Jul 1, 2014 at 1:21 AM, Pierre BLONDEAU >>>>>>>> <pierre.blondeau at unicaen.fr> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I join : >>>>>>>>> - osd.20 is one of osd that I detect which makes crash other >>>>>>>>> OSD. >>>>>>>>> - osd.23 is one of osd which crash when i start osd.20 >>>>>>>>> - mds, is one of my MDS >>>>>>>>> >>>>>>>>> I cut log file because they are to big but. All is here : >>>>>>>>> https://blondeau.users.greyc.fr/cephlog/ >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> >>>>>>>>> Le 30/06/2014 17:35, Gregory Farnum a ?crit : >>>>>>>>> >>>>>>>>>> What's the backtrace from the crashing OSDs? >>>>>>>>>> >>>>>>>>>> Keep in mind that as a dev release, it's generally best not to >>>>>>>>>> upgrade >>>>>>>>>> to unnamed versions like 0.82 (but it's probably too late to go >>>>>>>>>> back >>>>>>>>>> now). >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I will remember it the next time ;) >>>>>>>>> >>>>>>>>>> -Greg >>>>>>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>>>>>>>>> >>>>>>>>>> On Mon, Jun 30, 2014 at 8:06 AM, Pierre BLONDEAU >>>>>>>>>> <pierre.blondeau at unicaen.fr> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> After the upgrade to firefly, I have some PG in peering state. >>>>>>>>>>> I seen the output of 0.82 so I try to upgrade for solved my >>>>>>>>>>> problem. >>>>>>>>>>> >>>>>>>>>>> My three MDS crash and some OSD triggers a chain reaction that >>>>>>>>>>> kills >>>>>>>>>>> other >>>>>>>>>>> OSD. >>>>>>>>>>> I think my MDS will not start because of the metadata are on the >>>>>>>>>>> OSD. >>>>>>>>>>> >>>>>>>>>>> I have 36 OSD on three servers and I identified 5 OSD which makes >>>>>>>>>>> crash >>>>>>>>>>> others. If i not start their, the cluster passe in reconstructive >>>>>>>>>>> state >>>>>>>>>>> with >>>>>>>>>>> 31 OSD but i have 378 in down+peering state. >>>>>>>>>>> >>>>>>>>>>> How can I do ? Would you more information ( os, crash log, etc ... >>>>>>>>>>> ) >>>>>>>>>>> ? >>>>>>>>>>> >>>>>>>>>>> Regards >>>>> >>>>> >>>>> >>>>> -- >>>>> ---------------------------------------------- >>>>> Pierre BLONDEAU >>>>> Administrateur Syst?mes & r?seaux >>>>> Universit? de Caen >>>>> Laboratoire GREYC, D?partement d'informatique >>>>> >>>>> tel : 02 31 56 75 42 >>>>> bureau : Campus 2, Science 3, 406 >>>>> ---------------------------------------------- >>>>> >> >> >> -- >> ---------------------------------------------- >> Pierre BLONDEAU >> Administrateur Syst?mes & r?seaux >> Universit? de Caen >> Laboratoire GREYC, D?partement d'informatique >> >> tel : 02 31 56 75 42 >> bureau : Campus 2, Science 3, 406 >> ----------------------------------------------