Re: Can't mount Cephfs

Jan Schermer <jan@xxxxxxxxxxx> · Wed, 26 Aug 2015 12:26:46 +0200

If you lost 3 disks with size 2 and at least 2 of those disks were in different host, that means you lost data with the default CRUSH.There's nothing you can do but either get those disks back in or recover from backup.

Jan

On 26 Aug 2015, at 12:18, Andrzej Łukawski <alukawski@xxxxxxxxxx> wrote:

    Hi,

    We have ceph cluster (Ceph version 0.94.2) which consists of four
    nodes with four disks on each node. Ceph is configured to hold two
    replicas (size 2). We use this cluster for ceph filesystem. Few days
    ago we had power outage after which I had to replace three of our
    cluster OSD disks. All OSD disks are now online, but I'm unable to
    mount filesystem and constantly receive 'mount error 5 =
    Input/output error'.  Ceph status shows many 'incomplete' pgs and
    that 'mds cluster is degraded'. According to 'ceph health detail'
    mds is replaying journal. 

    [root@cnode0 ceph]# ceph -s

        cluster 39c717a3-5e15-4e5e-bc54-7e7f1fd0ee24

         health HEALTH_WARN

                25 pgs backfill_toofull

                10 pgs degraded

                126 pgs down

                263 pgs incomplete

                54 pgs stale

                10 pgs stuck degraded

                263 pgs stuck inactive

                54 pgs stuck stale

                289 pgs stuck unclean

                10 pgs stuck undersized

                10 pgs undersized

                4 requests are blocked > 32 sec

                recovery 27139/10407227 objects degraded (0.261%)

                recovery 168597/10407227 objects misplaced (1.620%)

                4 near full osd(s)

                too many PGs per OSD (312 > max 300)

                mds cluster is degraded

         monmap e6: 6 mons at
{0=x.x.70.1:6789/0,0m=x.x.71.1:6789/0,1=x.x.70.2:6789/0,1m=x.x.71.2:6789/0,2=x.x.70.3:6789/0,2m=x.x.71.3:6789/0}

                election epoch 2958, quorum 0,1,2,3,4,5 0,1,2,0m,1m,2m

         mdsmap e1236: 1/1/1 up {0=2=up:replay}, 2 up:standby

         osdmap e83705: 16 osds: 16 up, 16 in; 26 remapped pgs

          pgmap v40869228: 2496 pgs, 3 pools, 16952 GB data, 5046
    kobjects

                32825 GB used, 11698 GB / 44524 GB avail

                27139/10407227 objects degraded (0.261%)

                168597/10407227 objects misplaced (1.620%)

                    2153 active+clean

                     137 incomplete

                     126 down+incomplete

                      54 stale+active+clean

                      15 active+remapped+backfill_toofull

                      10
    active+undersized+degraded+remapped+backfill_toofull

                       1 active+remapped

    [root@cnode0 ceph]#

    I wasn't able to find any solution in the Internet and I worry I
    will make things even worse when continue to troubleshoot this on my
    own. I'm stuck. Could you please help?

    Thanks.

    Andrzej

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com