recovering from unhealthy state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,


I've managed to get cepth in a unhealthy state, from which it will not
recover automatically. I've done some 'ceph osd out X' and stopped
ceph-osd processes before the rebalancing was completed. (All in a test
environment :-) )

Now I see:

# ceph -w
  cluster 7fac9ad3-455e-4570-ae24-5c4311763bf9
   health HEALTH_WARN 12 pgs degraded; 9 pgs stale; 9 pgs stuck stale; 964 pgs stuck unclean; recovery 617/50262 degraded (1.228%)
   monmap e4: 3 mons at {n2=192.168.5.12:6789/0,node01=192.168.5.10:6789/0,node03=192.168.5.11:6789/0}, election epoch 126, quorum 0,1,2 n2,node01,node03
   osdmap e1462: 17 osds: 17 up, 10 in
    pgmap v198793: 4416 pgs: 3452 active+clean, 2 stale+active, 943 active+remapped, 12 active+degraded, 7 stale+active+remapped; 95639 MB data, 192 GB used, 15628 GB / 15821 GB avail; 0B/s rd, 110KB/s wr, 9op/s; 617/50262 degraded (1.228%)
   mdsmap e1: 0/0/1 up


2013-10-10 07:02:57.741031 mon.0 [INF] pgmap v198792: 4416 pgs: 3452 active+clean, 2 stale+active, 943 active+remapped, 12 active+degraded, 7 stale+active+remapped; 95639 MB data, 192 GB used, 15628 GB / 15821 GB avail; 0B/s rd, 17492B/s wr, 2op/s; 617/50262 degraded (1.228%)

I've seen some documentation at 
http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/

      * inactive - The placement group has not been active for too long
        (i.e., it hasn’t been able to service read/write requests).
      * unclean - The placement group has not been clean for too long
        (i.e., it hasn’t been able to completely recover from a previous
        failure).
      * stale - The placement group status has not been updated by a
        ceph-osd, indicating that all nodes storing this placement group
        may be down.

Which leaves 'remapped' and 'degraded' unexplained (though I can imagine
what they mean).

I presume I've lost some data. Alas. How to get to a clean state again?
I mean, if you're stuck with lost data, you don't want to have the
cluster in a unhealthy state forever. I'd like to just cut my losses an
get on.


- Kees
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux