Re: recovering from unhealthy state

Gregory Farnum <greg@xxxxxxxxxxx> · Thu, 10 Oct 2013 10:49:03 -0700



On Thu, Oct 10, 2013 at 10:40 AM, Kees Bos <k.bos@xxxxxxxxxxx> wrote:
> On Thu, 2013-10-10 at 18:24 +0200, Gregory Farnum wrote:
>> On Wed, Oct 9, 2013 at 10:19 PM, Kees Bos <k.bos@xxxxxxxxxxx> wrote:
>> > Hi,
>> >
>> >
>> > I've managed to get cepth in a unhealthy state, from which it will not
>> > recover automatically. I've done some 'ceph osd out X' and stopped
>> > ceph-osd processes before the rebalancing was completed. (All in a test
>> > environment :-) )
>> >
>> > Now I see:
>> >
>> > # ceph -w
>> >   cluster 7fac9ad3-455e-4570-ae24-5c4311763bf9
>> >    health HEALTH_WARN 12 pgs degraded; 9 pgs stale; 9 pgs stuck stale; 964 pgs stuck unclean; recovery 617/50262 degraded (1.228%)
>> >    monmap e4: 3 mons at {n2=192.168.5.12:6789/0,node01=192.168.5.10:6789/0,node03=192.168.5.11:6789/0}, election epoch 126, quorum 0,1,2 n2,node01,node03
>> >    osdmap e1462: 17 osds: 17 up, 10 in
>> >     pgmap v198793: 4416 pgs: 3452 active+clean, 2 stale+active, 943 active+remapped, 12 active+degraded, 7 stale+active+remapped; 95639 MB data, 192 GB used, 15628 GB / 15821 GB avail; 0B/s rd, 110KB/s wr, 9op/s; 617/50262 degraded (1.228%)
>> >    mdsmap e1: 0/0/1 up
>> >
>> >
>> > 2013-10-10 07:02:57.741031 mon.0 [INF] pgmap v198792: 4416 pgs: 3452 active+clean, 2 stale+active, 943 active+remapped, 12 active+degraded, 7 stale+active+remapped; 95639 MB data, 192 GB used, 15628 GB / 15821 GB avail; 0B/s rd, 17492B/s wr, 2op/s; 617/50262 degraded (1.228%)
>> >
>> > I've seen some documentation at
>> > http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/
>> >
>> >       * inactive - The placement group has not been active for too long
>> >         (i.e., it hasn’t been able to service read/write requests).
>> >       * unclean - The placement group has not been clean for too long
>> >         (i.e., it hasn’t been able to completely recover from a previous
>> >         failure).
>> >       * stale - The placement group status has not been updated by a
>> >         ceph-osd, indicating that all nodes storing this placement group
>> >         may be down.
>> >
>> > Which leaves 'remapped' and 'degraded' unexplained (though I can imagine
>> > what they mean).
>> >
>> > I presume I've lost some data. Alas. How to get to a clean state again?
>> > I mean, if you're stuck with lost data, you don't want to have the
>> > cluster in a unhealthy state forever. I'd like to just cut my losses an
>> > get on.
>>
>> Have you figured out what's going on with the stale PGs (and the OSDs
>> hosting them), following the instructions at that Troubleshooting
>> link? At some point it will probably become necessary to declare OSDs
>> lost, and then Ceph will give up on the data and move on. If you still
>> have any of those "down" OSDs, turning them back on should resolve
>> things, and any of them which you've actually decommissioned you need
>> to remove from the map. Your remapped PGs will probably get better if
>> you mark the up+out OSDs back in to the cluster.
>
> Nah, I've really mangled the cluster. The osds with problems have been
> replaced with empty ones (first taken out the cluster, than added back
> in as new (empty) osds with the same ids).
>
> I could try to mark osds as out that are reported as having stuck pgs.
> Then mark them (the osds) as lost, then remove these and create new
> ones. Does that make sense?

Hrm. It's possible you're going to have some problems with this then.
But first you should try and figure out the state of the stuck PGs —
in particular, marking OSDs "out" won't prevent them from being used
for recovery, and if you mark them all out and shut them down then
you'll lose other PGs. (It might work if you let all the existing PGs
migrate off of them and recover, then take them down and mark
lost...I've never gone through the recovery models with an eye to what
happens if you tell the cluster it has an OSD that doesn't match up
with the OSD that was lost.)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com