Yeah, divergent osdmaps: 555ed048e73024687fc8b106a570db4f osd-20_osdmap.13258__0_4E62BB79__none 6037911f31dc3c18b05499d24dcdbe5c osd-23_osdmap.13258__0_4E62BB79__none Joao: thoughts? -Sam On Wed, Jul 2, 2014 at 3:39 PM, Pierre BLONDEAU <pierre.blondeau at unicaen.fr> wrote: > The files > > When I upgrade : > ceph-deploy install --stable firefly servers... > on each servers service ceph restart mon > on each servers service ceph restart osd > on each servers service ceph restart mds > > I upgraded from emperor to firefly. After repair, remap, replace, etc ... I > have some PG which pass in peering state. > > I thought why not try the version 0.82, it could solve my problem. ( > It's my mistake ). So, I upgrade from firefly to 0.83 with : > ceph-deploy install --testing servers... > .. > > Now, all programs are in version 0.82. > I have 3 mons, 36 OSD and 3 mds. > > Pierre > > PS : I find also "inc\uosdmap.13258__0_469271DE__none" on each meta > directory. > > Le 03/07/2014 00:10, Samuel Just a ?crit : > >> Also, what version did you upgrade from, and how did you upgrade? >> -Sam >> >> On Wed, Jul 2, 2014 at 3:09 PM, Samuel Just <sam.just at inktank.com> wrote: >>> >>> Ok, in current/meta on osd 20 and osd 23, please attach all files >>> matching >>> >>> ^osdmap.13258.* >>> >>> There should be one such file on each osd. (should look something like >>> osdmap.6__0_FD6E4C01__none, probably hashed into a subdirectory, >>> you'll want to use find). >>> >>> What version of ceph is running on your mons? How many mons do you have? >>> -Sam >>> >>> On Wed, Jul 2, 2014 at 2:21 PM, Pierre BLONDEAU >>> <pierre.blondeau at unicaen.fr> wrote: >>>> >>>> Hi, >>>> >>>> I do it, the log files are available here : >>>> https://blondeau.users.greyc.fr/cephlog/debug20/ >>>> >>>> The OSD's files are really big +/- 80M . >>>> >>>> After starting the osd.20 some other osd crash. I pass from 31 osd up to >>>> 16. >>>> I remark that after this the number of down+peering PG decrease from 367 >>>> to >>>> 248. It's "normal" ? May be it's temporary, the time that the cluster >>>> verifies all the PG ? >>>> >>>> Regards >>>> Pierre >>>> >>>> Le 02/07/2014 19:16, Samuel Just a ?crit : >>>> >>>>> You should add >>>>> >>>>> debug osd = 20 >>>>> debug filestore = 20 >>>>> debug ms = 1 >>>>> >>>>> to the [osd] section of the ceph.conf and restart the osds. I'd like >>>>> all three logs if possible. >>>>> >>>>> Thanks >>>>> -Sam >>>>> >>>>> On Wed, Jul 2, 2014 at 5:03 AM, Pierre BLONDEAU >>>>> <pierre.blondeau at unicaen.fr> wrote: >>>>>> >>>>>> >>>>>> Yes, but how i do that ? >>>>>> >>>>>> With a command like that ? >>>>>> >>>>>> ceph tell osd.20 injectargs '--debug-osd 20 --debug-filestore 20 >>>>>> --debug-ms >>>>>> 1' >>>>>> >>>>>> By modify the /etc/ceph/ceph.conf ? This file is really poor because I >>>>>> use >>>>>> udev detection. >>>>>> >>>>>> When I have made these changes, you want the three log files or only >>>>>> osd.20's ? >>>>>> >>>>>> Thank you so much for the help >>>>>> >>>>>> Regards >>>>>> Pierre >>>>>> >>>>>> Le 01/07/2014 23:51, Samuel Just a ?crit : >>>>>> >>>>>>> Can you reproduce with >>>>>>> debug osd = 20 >>>>>>> debug filestore = 20 >>>>>>> debug ms = 1 >>>>>>> ? >>>>>>> -Sam >>>>>>> >>>>>>> On Tue, Jul 1, 2014 at 1:21 AM, Pierre BLONDEAU >>>>>>> <pierre.blondeau at unicaen.fr> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I join : >>>>>>>> - osd.20 is one of osd that I detect which makes crash other >>>>>>>> OSD. >>>>>>>> - osd.23 is one of osd which crash when i start osd.20 >>>>>>>> - mds, is one of my MDS >>>>>>>> >>>>>>>> I cut log file because they are to big but. All is here : >>>>>>>> https://blondeau.users.greyc.fr/cephlog/ >>>>>>>> >>>>>>>> Regards >>>>>>>> >>>>>>>> Le 30/06/2014 17:35, Gregory Farnum a ?crit : >>>>>>>> >>>>>>>>> What's the backtrace from the crashing OSDs? >>>>>>>>> >>>>>>>>> Keep in mind that as a dev release, it's generally best not to >>>>>>>>> upgrade >>>>>>>>> to unnamed versions like 0.82 (but it's probably too late to go >>>>>>>>> back >>>>>>>>> now). >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I will remember it the next time ;) >>>>>>>> >>>>>>>>> -Greg >>>>>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>>>>>>>> >>>>>>>>> On Mon, Jun 30, 2014 at 8:06 AM, Pierre BLONDEAU >>>>>>>>> <pierre.blondeau at unicaen.fr> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> After the upgrade to firefly, I have some PG in peering state. >>>>>>>>>> I seen the output of 0.82 so I try to upgrade for solved my >>>>>>>>>> problem. >>>>>>>>>> >>>>>>>>>> My three MDS crash and some OSD triggers a chain reaction that >>>>>>>>>> kills >>>>>>>>>> other >>>>>>>>>> OSD. >>>>>>>>>> I think my MDS will not start because of the metadata are on the >>>>>>>>>> OSD. >>>>>>>>>> >>>>>>>>>> I have 36 OSD on three servers and I identified 5 OSD which makes >>>>>>>>>> crash >>>>>>>>>> others. If i not start their, the cluster passe in reconstructive >>>>>>>>>> state >>>>>>>>>> with >>>>>>>>>> 31 OSD but i have 378 in down+peering state. >>>>>>>>>> >>>>>>>>>> How can I do ? Would you more information ( os, crash log, etc ... >>>>>>>>>> ) >>>>>>>>>> ? >>>>>>>>>> >>>>>>>>>> Regards >>>> >>>> >>>> >>>> -- >>>> ---------------------------------------------- >>>> Pierre BLONDEAU >>>> Administrateur Syst?mes & r?seaux >>>> Universit? de Caen >>>> Laboratoire GREYC, D?partement d'informatique >>>> >>>> tel : 02 31 56 75 42 >>>> bureau : Campus 2, Science 3, 406 >>>> ---------------------------------------------- >>>> > > > -- > ---------------------------------------------- > Pierre BLONDEAU > Administrateur Syst?mes & r?seaux > Universit? de Caen > Laboratoire GREYC, D?partement d'informatique > > tel : 02 31 56 75 42 > bureau : Campus 2, Science 3, 406 > ----------------------------------------------