Re: Destroyed Ceph Cluster

Georg Höllrigl <georg.hoellrigl@xxxxxxxxxx> · Fri, 23 Aug 2013 07:34:37 +0200

Thank you - It works now as expected.
I've removed the MDS. As soon as the 2nd osd machine came up, it fixed 
the other errors!?

On 19.08.2013 18:28, Gregory Farnum wrote:
Have you ever used the FS? It's missing an object which we're
intermittently seeing failures to create (on initial setup) when the
cluster is unstable.
If so, clear out the metadata pool and check the docs for "newfs".
-Greg

On Monday, August 19, 2013, Georg Höllrigl wrote:

    Hello List,

    The troubles to fix such a cluster continue... I get output like
    this now:

    # ceph health
    HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean; mds cluster is
    degraded; mds vvx-ceph-m-03 is laggy

    When checking for the ceph-mds processes, there are now none left...
    no matter which server I check. And the won't start up again!?

    The log starts up with:
    2013-08-19 11:23:30.503214 7f7e9dfbd780  0 ceph version 0.67
    (__e3b7bc5bce8ab330ec166138107236__8af3c218a0), process ceph-mds,
    pid 27636
    2013-08-19 11:23:30.523314 7f7e9904b700  1 mds.-1.0 handle_mds_map
    standby
    2013-08-19 11:23:30.529418 7f7e9904b700  1 mds.0.26 handle_mds_map i
    am now mds.0.26
    2013-08-19 11:23:30.529423 7f7e9904b700  1 mds.0.26 handle_mds_map
    state change up:standby --> up:replay
    2013-08-19 11:23:30.529426 7f7e9904b700  1 mds.0.26 replay_start
    2013-08-19 11:23:30.529434 7f7e9904b700  1 mds.0.26  recovery set is
    2013-08-19 11:23:30.529436 7f7e9904b700  1 mds.0.26  need osdmap
    epoch 277, have 276
    2013-08-19 11:23:30.529438 7f7e9904b700  1 mds.0.26  waiting for
    osdmap 277 (which blacklists prior instance)
    2013-08-19 11:23:30.534090 7f7e9904b700 -1 mds.0.sessionmap
    _load_finish got (2) No such file or directory
    2013-08-19 11:23:30.535483 7f7e9904b700 -1 mds/SessionMap.cc: In
    function 'void SessionMap::_load_finish(int, ceph::bufferlist&)'
    thread 7f7e9904b700 time 2013-08-19 11:23:30.534107
    mds/SessionMap.cc: 83: FAILED assert(0 == "failed to load sessionmap")

    Anyone an idea how to get the cluster back running?

    Georg

    On 16.08.2013 16:23, Mark Nelson wrote:

        Hi Georg,

        I'm not an expert on the monitors, but that's probably where I would
        start.  Take a look at your monitor logs and see if you can get
        a sense
        for why one of your monitors is down.  Some of the other devs will
        probably be around later that might know if there are any known
        issues
        with recreating the OSDs and missing PGs.

        Mark

        On 08/16/2013 08:21 AM, Georg Höllrigl wrote:

            Hello,

            I'm still evaluating ceph - now a test cluster with the 0.67
            dumpling.
            I've created the setup with ceph-deploy from GIT.
            I've recreated a bunch of OSDs, to give them another journal.
            There already was some test data on these OSDs.
            I've already recreated the missing PGs with "ceph pg
            force_create_pg"

            HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean; 5
            requests
            are blocked > 32 sec; mds cluster is degraded; 1 mons down,
            quorum
            0,1,2 vvx-ceph-m-01,vvx-ceph-m-02,__vvx-ceph-m-03

            Any idea how to fix the cluster, besides completley
            rebuilding the
            cluster from scratch? What if such a thing happens in a
            production
            environment...

            The pgs from "ceph pg dump" looks all like creating for some
            time now:

            2.3d    0       0       0       0       0       0       0
            creating
                   2013-08-16 13:43:08.186537       0'0     0:0 []
              [] 0'0
            0.0000000'0     0.000000

            Is there a way to just dump the data, that was on the
            discarded OSDs?

            Kind Regards,
            Georg
            _________________________________________________
            ceph-users mailing list
            ceph-users@xxxxxxxxxxxxxx
            http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
            <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

        _________________________________________________
        ceph-users mailing list
        ceph-users@xxxxxxxxxxxxxx
        http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
        <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

    _________________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx
    http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

--
Software Engineer #42 @ http://inktank.com | http://ceph.com

--
Dipl.-Ing. (FH) Georg Höllrigl
Technik

________________________________________________________________________________

Xidras GmbH
Stockern 47
3744 Stockern
Austria

Tel:     +43 (0) 2983 201 - 30505
Fax:     +43 (0) 2983 201 - 930505
Email:   georg.hoellrigl@xxxxxxxxxx
Web:     http://www.xidras.com

FN 317036 f | Landesgericht Krems | ATU64485024

________________________________________________________________________________

VERTRAULICHE INFORMATIONEN!
Diese eMail enthält vertrauliche Informationen und ist nur für den 
berechtigten
Empfänger bestimmt. Wenn diese eMail nicht für Sie bestimmt ist, bitten 
wir Sie,
diese eMail an uns zurückzusenden und anschließend auf Ihrem Computer und
Mail-Server zu löschen. Solche eMails und Anlagen dürfen Sie weder nutzen,
noch verarbeiten oder Dritten zugänglich machen, gleich in welcher Form.
Wir danken für Ihre Kooperation!

CONFIDENTIAL!
This email contains confidential information and is intended for the 
authorised
recipient only. If you are not an authorised recipient, please return 
the email
to us and then delete it from your computer and mail-server. You may neither
use nor edit any such emails including attachments, nor make them accessible
to third parties in any manner whatsoever.
Thank you for your cooperation

________________________________________________________________________________ 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com