Le 03/07/2014 00:55, Samuel Just a ?crit : > Ah, > > ~/logs ? for i in 20 23; do ../ceph/src/osdmaptool --export-crush > /tmp/crush$i osd-$i*; ../ceph/src/crushtool -d /tmp/crush$i > > /tmp/crush$i.d; done; diff /tmp/crush20.d /tmp/crush23.d > ../ceph/src/osdmaptool: osdmap file 'osd-20_osdmap.13258__0_4E62BB79__none' > ../ceph/src/osdmaptool: exported crush map to /tmp/crush20 > ../ceph/src/osdmaptool: osdmap file 'osd-23_osdmap.13258__0_4E62BB79__none' > ../ceph/src/osdmaptool: exported crush map to /tmp/crush23 > 6d5 > < tunable chooseleaf_vary_r 1 > > Looks like the chooseleaf_vary_r tunable somehow ended up divergent? > > Pierre: do you recall how and when that got set? I am not sure to understand, but if I good remember after the update in firefly, I was in state : HEALTH_WARN crush map has legacy tunables and I see "feature set mismatch" in log. So if I good remeber, i do : ceph osd crush tunables optimal for the problem of "crush map" and I update my client and server kernel to 3.16rc. It's could be that ? Pierre > -Sam > > On Wed, Jul 2, 2014 at 3:43 PM, Samuel Just <sam.just at inktank.com> wrote: >> Yeah, divergent osdmaps: >> 555ed048e73024687fc8b106a570db4f osd-20_osdmap.13258__0_4E62BB79__none >> 6037911f31dc3c18b05499d24dcdbe5c osd-23_osdmap.13258__0_4E62BB79__none >> >> Joao: thoughts? >> -Sam >> >> On Wed, Jul 2, 2014 at 3:39 PM, Pierre BLONDEAU >> <pierre.blondeau at unicaen.fr> wrote: >>> The files >>> >>> When I upgrade : >>> ceph-deploy install --stable firefly servers... >>> on each servers service ceph restart mon >>> on each servers service ceph restart osd >>> on each servers service ceph restart mds >>> >>> I upgraded from emperor to firefly. After repair, remap, replace, etc ... I >>> have some PG which pass in peering state. >>> >>> I thought why not try the version 0.82, it could solve my problem. ( >>> It's my mistake ). So, I upgrade from firefly to 0.83 with : >>> ceph-deploy install --testing servers... >>> .. >>> >>> Now, all programs are in version 0.82. >>> I have 3 mons, 36 OSD and 3 mds. >>> >>> Pierre >>> >>> PS : I find also "inc\uosdmap.13258__0_469271DE__none" on each meta >>> directory. >>> >>> Le 03/07/2014 00:10, Samuel Just a ?crit : >>> >>>> Also, what version did you upgrade from, and how did you upgrade? >>>> -Sam >>>> >>>> On Wed, Jul 2, 2014 at 3:09 PM, Samuel Just <sam.just at inktank.com> wrote: >>>>> >>>>> Ok, in current/meta on osd 20 and osd 23, please attach all files >>>>> matching >>>>> >>>>> ^osdmap.13258.* >>>>> >>>>> There should be one such file on each osd. (should look something like >>>>> osdmap.6__0_FD6E4C01__none, probably hashed into a subdirectory, >>>>> you'll want to use find). >>>>> >>>>> What version of ceph is running on your mons? How many mons do you have? >>>>> -Sam >>>>> >>>>> On Wed, Jul 2, 2014 at 2:21 PM, Pierre BLONDEAU >>>>> <pierre.blondeau at unicaen.fr> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I do it, the log files are available here : >>>>>> https://blondeau.users.greyc.fr/cephlog/debug20/ >>>>>> >>>>>> The OSD's files are really big +/- 80M . >>>>>> >>>>>> After starting the osd.20 some other osd crash. I pass from 31 osd up to >>>>>> 16. >>>>>> I remark that after this the number of down+peering PG decrease from 367 >>>>>> to >>>>>> 248. It's "normal" ? May be it's temporary, the time that the cluster >>>>>> verifies all the PG ? >>>>>> >>>>>> Regards >>>>>> Pierre >>>>>> >>>>>> Le 02/07/2014 19:16, Samuel Just a ?crit : >>>>>> >>>>>>> You should add >>>>>>> >>>>>>> debug osd = 20 >>>>>>> debug filestore = 20 >>>>>>> debug ms = 1 >>>>>>> >>>>>>> to the [osd] section of the ceph.conf and restart the osds. I'd like >>>>>>> all three logs if possible. >>>>>>> >>>>>>> Thanks >>>>>>> -Sam >>>>>>> >>>>>>> On Wed, Jul 2, 2014 at 5:03 AM, Pierre BLONDEAU >>>>>>> <pierre.blondeau at unicaen.fr> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Yes, but how i do that ? >>>>>>>> >>>>>>>> With a command like that ? >>>>>>>> >>>>>>>> ceph tell osd.20 injectargs '--debug-osd 20 --debug-filestore 20 >>>>>>>> --debug-ms >>>>>>>> 1' >>>>>>>> >>>>>>>> By modify the /etc/ceph/ceph.conf ? This file is really poor because I >>>>>>>> use >>>>>>>> udev detection. >>>>>>>> >>>>>>>> When I have made these changes, you want the three log files or only >>>>>>>> osd.20's ? >>>>>>>> >>>>>>>> Thank you so much for the help >>>>>>>> >>>>>>>> Regards >>>>>>>> Pierre >>>>>>>> >>>>>>>> Le 01/07/2014 23:51, Samuel Just a ?crit : >>>>>>>> >>>>>>>>> Can you reproduce with >>>>>>>>> debug osd = 20 >>>>>>>>> debug filestore = 20 >>>>>>>>> debug ms = 1 >>>>>>>>> ? >>>>>>>>> -Sam >>>>>>>>> >>>>>>>>> On Tue, Jul 1, 2014 at 1:21 AM, Pierre BLONDEAU >>>>>>>>> <pierre.blondeau at unicaen.fr> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I join : >>>>>>>>>> - osd.20 is one of osd that I detect which makes crash other >>>>>>>>>> OSD. >>>>>>>>>> - osd.23 is one of osd which crash when i start osd.20 >>>>>>>>>> - mds, is one of my MDS >>>>>>>>>> >>>>>>>>>> I cut log file because they are to big but. All is here : >>>>>>>>>> https://blondeau.users.greyc.fr/cephlog/ >>>>>>>>>> >>>>>>>>>> Regards >>>>>>>>>> >>>>>>>>>> Le 30/06/2014 17:35, Gregory Farnum a ?crit : >>>>>>>>>> >>>>>>>>>>> What's the backtrace from the crashing OSDs? >>>>>>>>>>> >>>>>>>>>>> Keep in mind that as a dev release, it's generally best not to >>>>>>>>>>> upgrade >>>>>>>>>>> to unnamed versions like 0.82 (but it's probably too late to go >>>>>>>>>>> back >>>>>>>>>>> now). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I will remember it the next time ;) >>>>>>>>>> >>>>>>>>>>> -Greg >>>>>>>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 30, 2014 at 8:06 AM, Pierre BLONDEAU >>>>>>>>>>> <pierre.blondeau at unicaen.fr> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> After the upgrade to firefly, I have some PG in peering state. >>>>>>>>>>>> I seen the output of 0.82 so I try to upgrade for solved my >>>>>>>>>>>> problem. >>>>>>>>>>>> >>>>>>>>>>>> My three MDS crash and some OSD triggers a chain reaction that >>>>>>>>>>>> kills >>>>>>>>>>>> other >>>>>>>>>>>> OSD. >>>>>>>>>>>> I think my MDS will not start because of the metadata are on the >>>>>>>>>>>> OSD. >>>>>>>>>>>> >>>>>>>>>>>> I have 36 OSD on three servers and I identified 5 OSD which makes >>>>>>>>>>>> crash >>>>>>>>>>>> others. If i not start their, the cluster passe in reconstructive >>>>>>>>>>>> state >>>>>>>>>>>> with >>>>>>>>>>>> 31 OSD but i have 378 in down+peering state. >>>>>>>>>>>> >>>>>>>>>>>> How can I do ? Would you more information ( os, crash log, etc ... >>>>>>>>>>>> ) >>>>>>>>>>>> ? >>>>>>>>>>>> >>>>>>>>>>>> Regards >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ---------------------------------------------- >>>>>> Pierre BLONDEAU >>>>>> Administrateur Syst?mes & r?seaux >>>>>> Universit? de Caen >>>>>> Laboratoire GREYC, D?partement d'informatique >>>>>> >>>>>> tel : 02 31 56 75 42 >>>>>> bureau : Campus 2, Science 3, 406 >>>>>> ---------------------------------------------- >>>>>> >>> >>> >>> -- >>> ---------------------------------------------- >>> Pierre BLONDEAU >>> Administrateur Syst?mes & r?seaux >>> Universit? de Caen >>> Laboratoire GREYC, D?partement d'informatique >>> >>> tel : 02 31 56 75 42 >>> bureau : Campus 2, Science 3, 406 >>> ---------------------------------------------- -- ---------------------------------------------- Pierre BLONDEAU Administrateur Syst?mes & r?seaux Universit? de Caen Laboratoire GREYC, D?partement d'informatique tel : 02 31 56 75 42 bureau : Campus 2, Science 3, 406 ---------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2947 bytes Desc: Signature cryptographique S/MIME URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140703/ee45dc06/attachment.bin>