Re: Ceph in Production: best practice to monitor OSD up/down status

Gregory Farnum <greg@xxxxxxxxxxx> · Mon, 23 Mar 2015 07:22:04 -0700



On Mon, Mar 23, 2015 at 7:17 AM, Saverio Proto <zioproto@xxxxxxxxx> wrote:
> Hello,
>
> thanks for the answers.
>
> This was exacly what I was looking for:
>
> mon_osd_down_out_interval = 900
>
> I was not waiting long enoght to see my cluster recovering by itself.
> That's why I tried to increase min_size, because I did not understand
> what min_size was for.
>
> Now that I know what is min_size, I guess the best setting for me is
> min_size = 1 because I would like to be able to make I/O operations
> even of only 1 copy is left.

I'd strongly recommend leaving it at two — if you reduce it to 1 then
you can lose data by having just one disk die at an inopportune
moment, whereas if you leave it at 2 the system won't accept any
writes to only one hard drive. Leaving it at two the system will still
try and re-replicate back up to three copies after "mon osd down out
interval" time has elapsed from a failure. :)
-Greg

>
> Thanks to all for helping !
>
> Saverio
>
>
>
> 2015-03-23 14:58 GMT+01:00 Gregory Farnum <greg@xxxxxxxxxxx>:
>> On Sun, Mar 22, 2015 at 2:55 AM, Saverio Proto <zioproto@xxxxxxxxx> wrote:
>>> Hello,
>>>
>>> I started to work with CEPH few weeks ago, I might ask a very newbie
>>> question, but I could not find an answer in the docs or in the ml
>>> archive for this.
>>>
>>> Quick description of my setup:
>>> I have a ceph cluster with two servers. Each server has 3 SSD drives I
>>> use for journal only. To map to different failure domains SAS disks
>>> that keep a journal to the same SSD drive, I wrote my own crushmap.
>>> I have now a total of 36OSD. Ceph health returns HEALTH_OK.
>>> I run the cluster with a couple of pools with size=3 and min_size=3
>>>
>>>
>>> Production operations questions:
>>> I manually stopped some OSDs to simulate a failure.
>>>
>>> As far as I understood, an "OSD down" condition is not enough to make
>>> CEPH start making new copies of objects. I noticed that I must mark
>>> the OSD as "out" to make ceph produce new copies.
>>> As far as I understood min_size=3 puts the object in readonly if there
>>> are not at least 3 copies of the object available.
>>
>> That is correct, but the default with size 3 is 2 and you probably
>> want to do that instead. If you have size==min_size on firefly
>> releases and lose an OSD it can't do recovery so that PG is stuck
>> without manual intervention. :( This is because of some quirks about
>> how the OSD peering and recovery works, so you'd be forgiven for
>> thinking it would recover nicely.
>> (This is changed in the upcoming Hammer release, but you probably
>> still want to allow cluster activity when an OSD fails, unless you're
>> very confident in their uptime and more concerned about durability
>> than availability.)
>> -Greg
>>
>>>
>>> Is this behavior correct or I made some mistake creating the cluster ?
>>> Should I expect ceph to produce automatically a new copy for objects
>>> when some OSDs are down ?
>>> There is any option to mark automatically "out" OSDs that go "down" ?
>>>
>>> thanks
>>>
>>> Saverio
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com