Behaviour of a cluster with full OSD(s)

Max Power <maillists@xxxxxxxxxxxxxxxxxxxxxxxxxxx> · Tue, 23 Dec 2014 12:34:54 +0100 (CET)

I understand that the status "osd full" should never be reached. As I am new to
ceph I want to be prepared for this case. I tried two different scenarios and
here are my experiences:

The first one is to completely fill the storage (for me: writing files to a
rados blockdevice). I discovered that the writing client (dd for example) gets
completly stucked then. And this prevents me from stoping the process (SIGTERM,
SIGKILL). At the moment I restart the whole computer to prevent writing to the
cluster. Then I unmap the rbd device and set the full ratio a bit higher (0.95
to 0.97). I do a mount on my adminnode and delete files till everything is okay
again.
Is this the best practice? Is it possible to prevent the system from running in
a "osd full" state? I could make the block devices smaller than the cluster can
save. But it's hard to calculate this exactly.

The next scenario is to change a pool size from say 2 to 3 replicas. While the
cluster copies the objects it gets stuck as an osd reaches it limit. Normally
the osd process quits then and I cannot restart it (even after setting the
replicas back). The only possibility is to manually delete complete PG folders
after exploring them with 'pg dump'. Is this the only way to get it back working
again?

Greetings!
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com