Ceph in Production: best practice to monitor OSD up/down status

Saverio Proto <zioproto@xxxxxxxxx> · Sun, 22 Mar 2015 10:55:20 +0100

Hello,

I started to work with CEPH few weeks ago, I might ask a very newbie
question, but I could not find an answer in the docs or in the ml
archive for this.

Quick description of my setup:
I have a ceph cluster with two servers. Each server has 3 SSD drives I
use for journal only. To map to different failure domains SAS disks
that keep a journal to the same SSD drive, I wrote my own crushmap.
I have now a total of 36OSD. Ceph health returns HEALTH_OK.
I run the cluster with a couple of pools with size=3 and min_size=3

Production operations questions:
I manually stopped some OSDs to simulate a failure.

As far as I understood, an "OSD down" condition is not enough to make
CEPH start making new copies of objects. I noticed that I must mark
the OSD as "out" to make ceph produce new copies.
As far as I understood min_size=3 puts the object in readonly if there
are not at least 3 copies of the object available.

Is this behavior correct or I made some mistake creating the cluster ?
Should I expect ceph to produce automatically a new copy for objects
when some OSDs are down ?
There is any option to mark automatically "out" OSDs that go "down" ?

thanks

Saverio
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com