Some OSD and MDS crash

joao.luis@xxxxxxxxxxx (Joao Eduardo Luis) · Wed, 09 Jul 2014 16:39:18 +0100

On 07/09/2014 02:22 PM, Pierre BLONDEAU wrote:
> Hi,
>
> There is any chance to restore my data ?

Hello Pierre,

I've been giving this some thought and my guess is that yes, it should 
be possible.  However, it may not be a simple fix.

So, first of all, you got bit by http://tracker.ceph.com/issues/8738, 
which has been resolved and should be available on the next firefly 
point-release.

However, I doubt just upgrading will solve all your problems.  You'll 
have some OSDs with maps containing the chooseleaf_vary_r flag, while 
other OSDs won't.  You'll also have monitors serving such maps, while 
other monitors won't.

This may very well mean having to enable the flag throughout the cluster 
in all those maps that haven't got said flag enabled.  In which case 
this will mean having to put together a tool to do this, while a daemon 
is offline.

There may be however another way, but although simpler it's more intrusive:

First of all we'd have to know which monitor is the one with the 
appropriate maps (this would certainly be the firefly monitor), which 
I'm assuming is still online.

Then we'd have to remove all remaining monitors and add new, firefly 
monitors.  This way they'd sync up with the monitor with the correct maps.

Then we'd have to make sure in which map version this whole thing 
happened, and copy all maps from that point forward from the up OSDs to 
the OSDs that have divergent maps.

It would be nice if Sam could chime in and validate either approach.

   -Joao

>
> Regards
> Pierre
>
> Le 07/07/2014 15:42, Pierre BLONDEAU a ?crit :
>> No chance to have those logs and even less in debug mode. I do this
>> change 3 weeks ago.
>>
>> I put all my log here if it's can help :
>> https://blondeau.users.greyc.fr/cephlog/all/
>>
>> I have a chance to recover my +/- 20TB of data ?
>>
>> Regards
>>
>> Le 03/07/2014 21:48, Joao Luis a ?crit :
>>> Do those logs have a higher debugging level than the default? If not
>>> nevermind as they will not have enough information. If they do however,
>>> we'd be interested in the portion around the moment you set the
>>> tunables. Say, before the upgrade and a bit after you set the tunable.
>>> If you want to be finer grained, then ideally it would be the moment
>>> where those maps were created, but you'd have to grep the logs for that.
>>>
>>> Or drop the logs somewhere and I'll take a look.
>>>
>>>    -Joao
>>>
>>> On Jul 3, 2014 5:48 PM, "Pierre BLONDEAU" <pierre.blondeau at unicaen.fr
>>> <mailto:pierre.blondeau at unicaen.fr>> wrote:
>>>
>>>     Le 03/07/2014 13:49, Joao Eduardo Luis a ?crit :
>>>
>>>         On 07/03/2014 12:15 AM, Pierre BLONDEAU wrote:
>>>
>>>             Le 03/07/2014 00:55, Samuel Just a ?crit :
>>>
>>>                 Ah,
>>>
>>>                 ~/logs ? for i in 20 23; do ../ceph/src/osdmaptool
>>>                 --export-crush
>>>                 /tmp/crush$i osd-$i*; ../ceph/src/crushtool -d
>>>                 /tmp/crush$i >
>>>                 /tmp/crush$i.d; done; diff /tmp/crush20.d /tmp/crush23.d
>>>                 ../ceph/src/osdmaptool: osdmap file
>>>                 'osd-20_osdmap.13258__0___4E62BB79__none'
>>>                 ../ceph/src/osdmaptool: exported crush map to
>>> /tmp/crush20
>>>                 ../ceph/src/osdmaptool: osdmap file
>>>                 'osd-23_osdmap.13258__0___4E62BB79__none'
>>>                 ../ceph/src/osdmaptool: exported crush map to
>>> /tmp/crush23
>>>                 6d5
>>>                 < tunable chooseleaf_vary_r 1
>>>
>>>                   Looks like the chooseleaf_vary_r tunable somehow ended
>>>                 up divergent?
>>>
>>>
>>>         The only thing that comes to mind that could cause this is if we
>>>         changed
>>>         the leader's in-memory map, proposed it, it failed, and only the
>>>         leader
>>>         got to write the map to disk somehow.  This happened once on a
>>>         totally
>>>         different issue (although I can't pinpoint right now which).
>>>
>>>         In such a scenario, the leader would serve the incorrect
>>> osdmap to
>>>         whoever asked osdmaps from it, the remaining quorum would
>>> serve the
>>>         correct osdmaps to all the others.  This could cause this
>>>         divergence. Or
>>>         it could be something else.
>>>
>>>         Are there logs for the monitors for the timeframe this may have
>>>         happened
>>>         in?
>>>
>>>
>>>     Which exactly timeframe you want ? I have 7 days of logs, I should
>>>     have informations about the upgrade from firefly to 0.82.
>>>     Which mon's log do you want ? Three ?
>>>
>>>     Regards
>>>
>>>             -Joao
>>>
>>>
>>>                 Pierre: do you recall how and when that got set?
>>>
>>>
>>>             I am not sure to understand, but if I good remember after
>>>             the update in
>>>             firefly, I was in state : HEALTH_WARN crush map has legacy
>>>             tunables and
>>>             I see "feature set mismatch" in log.
>>>
>>>             So if I good remeber, i do : ceph osd crush tunables optimal
>>>             for the
>>>             problem of "crush map" and I update my client and server
>>>             kernel to
>>>             3.16rc.
>>>
>>>             It's could be that ?
>>>
>>>             Pierre
>>>
>>>                 -Sam
>>>
>>>                 On Wed, Jul 2, 2014 at 3:43 PM, Samuel Just
>>>                 <sam.just at inktank.com <mailto:sam.just at inktank.com>>
>>>                 wrote:
>>>
>>>                     Yeah, divergent osdmaps:
>>>                     555ed048e73024687fc8b106a570db__4f
>>>                       osd-20_osdmap.13258__0___4E62BB79__none
>>>                     6037911f31dc3c18b05499d24dcdbe__5c
>>>                       osd-23_osdmap.13258__0___4E62BB79__none
>>>
>>>                     Joao: thoughts?
>>>                     -Sam
>>>
>>>                     On Wed, Jul 2, 2014 at 3:39 PM, Pierre BLONDEAU
>>>                     <pierre.blondeau at unicaen.fr
>>>                     <mailto:pierre.blondeau at unicaen.fr>> wrote:
>>>
>>>                         The files
>>>
>>>                         When I upgrade :
>>>                            ceph-deploy install --stable firefly
>>> servers...
>>>                            on each servers service ceph restart mon
>>>                            on each servers service ceph restart osd
>>>                            on each servers service ceph restart mds
>>>
>>>                         I upgraded from emperor to firefly. After
>>>                         repair, remap, replace,
>>>                         etc ... I
>>>                         have some PG which pass in peering state.
>>>
>>>                         I thought why not try the version 0.82, it could
>>>                         solve my problem. (
>>>                         It's my mistake ). So, I upgrade from firefly to
>>>                         0.83 with :
>>>                            ceph-deploy install --testing servers...
>>>                            ..
>>>
>>>                         Now, all programs are in version 0.82.
>>>                         I have 3 mons, 36 OSD and 3 mds.
>>>
>>>                         Pierre
>>>
>>>                         PS : I find also
>>>                         "inc\uosdmap.13258__0___469271DE__none" on
>>> each meta
>>>                         directory.
>>>
>>>                         Le 03/07/2014 00:10, Samuel Just a ?crit :
>>>
>>>                             Also, what version did you upgrade from, and
>>>                             how did you upgrade?
>>>                             -Sam
>>>
>>>                             On Wed, Jul 2, 2014 at 3:09 PM, Samuel Just
>>>                             <sam.just at inktank.com
>>>                             <mailto:sam.just at inktank.com>>
>>>                             wrote:
>>>
>>>
>>>                                 Ok, in current/meta on osd 20 and osd
>>>                                 23, please attach all files
>>>                                 matching
>>>
>>>                                 ^osdmap.13258.*
>>>
>>>                                 There should be one such file on each
>>>                                 osd. (should look something
>>>                                 like
>>>                                 osdmap.6__0_FD6E4C01__none, probably
>>>                                 hashed into a subdirectory,
>>>                                 you'll want to use find).
>>>
>>>                                 What version of ceph is running on your
>>>                                 mons?  How many mons do
>>>                                 you have?
>>>                                 -Sam
>>>
>>>                                 On Wed, Jul 2, 2014 at 2:21 PM, Pierre
>>>                                 BLONDEAU
>>>                                 <pierre.blondeau at unicaen.fr
>>>                                 <mailto:pierre.blondeau at unicaen.fr>>
>>> wrote:
>>>
>>>
>>>                                     Hi,
>>>
>>>                                     I do it, the log files are available
>>>                                     here :
>>>
>>> https://blondeau.users.greyc.__fr/cephlog/debug20/
>>>
>>> <https://blondeau.users.greyc.fr/cephlog/debug20/>
>>>
>>>                                     The OSD's files are really big +/-
>>> 80M .
>>>
>>>                                     After starting the osd.20 some other
>>>                                     osd crash. I pass from 31
>>>                                     osd up to
>>>                                     16.
>>>                                     I remark that after this the number
>>>                                     of down+peering PG decrease
>>>                                     from 367
>>>                                     to
>>>                                     248. It's "normal" ? May be it's
>>>                                     temporary, the time that the
>>>                                     cluster
>>>                                     verifies all the PG ?
>>>
>>>                                     Regards
>>>                                     Pierre
>>>
>>>                                     Le 02/07/2014 19:16, Samuel Just a
>>>                                     ?crit :
>>>
>>>                                         You should add
>>>
>>>                                         debug osd = 20
>>>                                         debug filestore = 20
>>>                                         debug ms = 1
>>>
>>>                                         to the [osd] section of the
>>>                                         ceph.conf and restart the
>>> osds.  I'd
>>>                                         like
>>>                                         all three logs if possible.
>>>
>>>                                         Thanks
>>>                                         -Sam
>>>
>>>                                         On Wed, Jul 2, 2014 at 5:03 AM,
>>>                                         Pierre BLONDEAU
>>>                                         <pierre.blondeau at unicaen.fr
>>>
>>> <mailto:pierre.blondeau at unicaen.fr>>
>>>                                         wrote:
>>>
>>>
>>>
>>>                                             Yes, but how i do that ?
>>>
>>>                                             With a command like that ?
>>>
>>>                                             ceph tell osd.20 injectargs
>>>                                             '--debug-osd 20
>>>                                             --debug-filestore 20
>>>                                             --debug-ms
>>>                                             1'
>>>
>>>                                             By modify the
>>>                                             /etc/ceph/ceph.conf ? This
>>>                                             file is really poor
>>>                                             because I
>>>                                             use
>>>                                             udev detection.
>>>
>>>                                             When I have made these
>>>                                             changes, you want the three
>>>                                             log files or
>>>                                             only
>>>                                             osd.20's ?
>>>
>>>                                             Thank you so much for the
>>> help
>>>
>>>                                             Regards
>>>                                             Pierre
>>>
>>>                                             Le 01/07/2014 23:51, Samuel
>>>                                             Just a ?crit :
>>>
>>>                                                 Can you reproduce with
>>>                                                 debug osd = 20
>>>                                                 debug filestore = 20
>>>                                                 debug ms = 1
>>>                                                 ?
>>>                                                 -Sam
>>>
>>>                                                 On Tue, Jul 1, 2014 at
>>>                                                 1:21 AM, Pierre BLONDEAU
>>>
>>> <pierre.blondeau at unicaen.fr
>>>
>>> <mailto:pierre.blondeau at unicaen.fr>>
>>>                                                 wrote:
>>>
>>>
>>>
>>>
>>>                                                     Hi,
>>>
>>>                                                     I join :
>>>                                                           - osd.20 is
>>>                                                     one of osd that I
>>>                                                     detect which makes
>>> crash
>>>                                                     other
>>>                                                     OSD.
>>>                                                           - osd.23 is
>>>                                                     one of osd which
>>>                                                     crash when i start
>>>                                                     osd.20
>>>                                                           - mds, is one
>>>                                                     of my MDS
>>>
>>>                                                     I cut log file
>>>                                                     because they are to
>>>                                                     big but. All is
>>> here :
>>>
>>> https://blondeau.users.greyc.__fr/cephlog/
>>>
>>> <https://blondeau.users.greyc.fr/cephlog/>
>>>
>>>                                                     Regards
>>>
>>>                                                     Le 30/06/2014 17:35,
>>>                                                     Gregory Farnum a
>>> ?crit :
>>>
>>>                                                         What's the
>>>                                                         backtrace from
>>>                                                         the crashing
>>> OSDs?
>>>
>>>                                                         Keep in mind
>>>                                                         that as a dev
>>>                                                         release, it's
>>>                                                         generally best
>>>                                                         not to
>>>                                                         upgrade
>>>                                                         to unnamed
>>>                                                         versions like
>>>                                                         0.82 (but it's
>>>                                                         probably too
>>> late
>>>                                                         to go
>>>                                                         back
>>>                                                         now).
>>>
>>>
>>>
>>>
>>>                                                     I will remember it
>>>                                                     the next time ;)
>>>
>>>                                                         -Greg
>>>                                                         Software
>>>                                                         Engineer #42 @
>>>
>>> http://inktank.com
>>>                                                         |
>>> http://ceph.com
>>>
>>>                                                         On Mon, Jun 30,
>>>                                                         2014 at 8:06 AM,
>>>                                                         Pierre BLONDEAU
>>>
>>> <pierre.blondeau at unicaen.fr
>>>
>>> <mailto:pierre.blondeau at unicaen.fr>>
>>>                                                         wrote:
>>>
>>>
>>>
>>>                                                             Hi,
>>>
>>>                                                             After the
>>>                                                             upgrade to
>>>                                                             firefly, I
>>>                                                             have some PG
>>>                                                             in peering
>>>                                                             state.
>>>                                                             I seen the
>>>                                                             output of
>>>                                                             0.82 so I
>>>                                                             try to
>>>                                                             upgrade for
>>>                                                             solved my
>>>                                                             problem.
>>>
>>>                                                             My three MDS
>>>                                                             crash and
>>>                                                             some OSD
>>>                                                             triggers a
>>>                                                             chain
>>> reaction
>>>                                                             that
>>>                                                             kills
>>>                                                             other
>>>                                                             OSD.
>>>                                                             I think my
>>>                                                             MDS will not
>>>                                                             start
>>>                                                             because of
>>>                                                             the
>>> metadata are
>>>                                                             on the
>>>                                                             OSD.
>>>
>>>                                                             I have 36
>>>                                                             OSD on three
>>>                                                             servers and
>>>                                                             I identified
>>>                                                             5 OSD which
>>>                                                             makes
>>>                                                             crash
>>>                                                             others. If i
>>>                                                             not start
>>>                                                             their, the
>>>                                                             cluster
>>> passe in
>>>
>>> reconstructive
>>>                                                             state
>>>                                                             with
>>>                                                             31 OSD but i
>>>                                                             have 378 in
>>>                                                             down+peering
>>>                                                             state.
>>>
>>>                                                             How can I do
>>>                                                             ? Would you
>>>                                                             more
>>>                                                             information
>>>                                                             ( os,
>>> crash log,
>>>                                                             etc ...
>>>                                                             )
>>>                                                             ?
>>>
>>>                                                             Regards
>>>
>>>
>>>
>>>
>>>                                     --
>>>
>>> ------------------------------__----------------
>>>                                     Pierre BLONDEAU
>>>                                     Administrateur Syst?mes & r?seaux
>>>                                     Universit? de Caen
>>>                                     Laboratoire GREYC, D?partement
>>>                                     d'informatique
>>>
>>>                                     tel     : 02 31 56 75 42
>>>                                     bureau  : Campus 2, Science 3, 406
>>>
>>> ------------------------------__----------------
>>>
>>>
>>>
>>>                         --
>>>                         ------------------------------__----------------
>>>                         Pierre BLONDEAU
>>>                         Administrateur Syst?mes & r?seaux
>>>                         Universit? de Caen
>>>                         Laboratoire GREYC, D?partement d'informatique
>>>
>>>                         tel     : 02 31 56 75 42
>>>                         bureau  : Campus 2, Science 3, 406
>>>                         ------------------------------__----------------
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>     --
>>>     ------------------------------__----------------
>>>     Pierre BLONDEAU
>>>     Administrateur Syst?mes & r?seaux
>>>     Universit? de Caen
>>>     Laboratoire GREYC, D?partement d'informatique
>>>
>>>     tel     : 02 31 56 75 42
>>>     bureau  : Campus 2, Science 3, 406
>>>     ------------------------------__----------------
>>>
>>>
>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>

-- 
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com