Re: Cluster Map Problems

Martin Mailand <martin@xxxxxxxxxxxx> · Thu, 28 Mar 2013 02:44:33 +0100

Hi John,

if I shut down a osd, the cluster stays in degraded status for hours and
there is no recovery traffic at all.

-martin

On 28.03.2013 02:25, John Wilkins wrote:
> So the OSD you shutdown is down and in. How long does it stay in the
> degraded state? In the docs here,
> http://ceph.com/docs/master/rados/operations/monitoring-osd-pg/ , we
> discuss the notion that a down OSD is not technically out of the
> cluster for awhile. I believe the default value is 300 seconds, which
> is about 5 minutes. From what I can see from your "ceph osd tree"
> command, all your OSDs are running. You can change the time it takes
> to mark a down OSD out. That's " mon osd down out interval", discussed
> in this section:
> http://ceph.com/docs/master/rados/operations/monitoring-osd-pg/#degraded
> 
> On Wed, Mar 27, 2013 at 5:56 PM, Martin Mailand <martin@xxxxxxxxxxxx> wrote:
>> Hi,
>>
>> that's the config http://pastebin.com/2JzABSYt
>> ceph osd dump http://pastebin.com/GSCGKL1k
>> ceph osd tree http://pastebin.com/VSgPFRYv
>>
>> As far as I can tell they are not mapped right.
>>
>> sdmap e133 pool 'rbd' (2) object '2.31a' -> pg 2.f3caaf00 (2.300) -> up
>> [13,23] acting [13,23]
>>
>> -martin
>>
>> On 28.03.2013 01:09, John Wilkins wrote:
>>> We need a bit more information. If you can do: "ceph osd dump", "ceph
>>> osd tree", and paste your ceph conf, we might get a bit further. The
>>> CRUSH hierarchy looks okay. I can't see the replica size from this
>>> though.
>>>
>>> Have you followed this procedure to see if your object is getting
>>> remapped? http://ceph.com/docs/master/rados/operations/monitoring-osd-pg/#finding-an-object-location
>>>
>>> On Thu, Mar 21, 2013 at 12:02 PM, Martin Mailand <martin@xxxxxxxxxxxx> wrote:
>>>> Hi,
>>>>
>>>> I want to change my crushmap to reflect my setup, I have two racks with
>>>> each 3 hosts. I want to use for the rbd pool a replication size of 2.
>>>> The failure domain should be the rack, so each replica should be in each
>>>> rack. That works so far.
>>>> But if I shutdown a host the clusters stays degraded, but I want that
>>>> the now missing replicas get replicated to the two remaining hosts in
>>>> this rack.
>>>>
>>>> Here is crushmap.
>>>> http://pastebin.com/UaB6LfKs
>>>>
>>>> Any idea what I did wrong?
>>>>
>>>> -martin
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
> 
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com