Re: crashed+down+peering

Christian Brunner <chb@xxxxxx> · Sat, 4 Dec 2010 21:28:59 +0100

>> On Thu, 2 Dec 2010, Christian Brunner wrote:
>>> We have simulated the simultanious crash of multiple osds in our
>>> environment. After starting all the cosd again, we have the following
>>> situation:
>>>
>>> 2010-12-02 16:18:33.944436    pg v724432: 3712 pgs: 1 active, 3605
>>> active+clean, 1 crashed+peering, 46 down+peering, 56
>>> crashed+down+peering, 3 active+clean+inconsistent; 177 GB data, 365 GB
>>> used, 83437 GB / 83834 GB avail; 1/93704 degraded (0.001%)
>>>
>>> When I set of an "rbd rm" command for one of our rbd volumes, it seems
>>> to hit the the "crashed+down+peering" pg. After that the command is
>>> stuck.
>>
>> The pg isn't active, so any IO will hang until peering completes.  What
>> version of the code are you running?  If it's something from unstable
>> from the last couple of weeks it's probably related to problems there;
>> please upgrade and restart the osds.  If it's the latest and greatest
>> 'rc', we should look at the logs to see what's going on!
>
> We are running 0.23 - I will upgrade to the latest 'rc' tomorrow.

Upgrading to the latest rc version worked well. Everything is working
again and all, except one pg are "active+clean". However there is one
pg marked as "active+clean+inconsistent".

What can I do about an inconsistent group? I general a short
description of the possible pg states would be helpful.

Thanks,
Christian
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html