Data inaccessable after single OSD down, default size is 3 min size is 1

Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> · Fri, 4 Mar 2016 16:36:36 +0100

Hi,

we have here the effect, that single OSD's are getting down/out because
it happens that they are sometimes too slow.

osd_pool_default_size = 3
osd_pool_default_min_size = 1

pool 0 'rbd' replicated size 3 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 512 pgp_num 512 last_change 15391 flags hashpspool
stripe_width 0

pool 6 'cephfs_data' replicated size 3 min_size 1 crush_ruleset 0
object_hash rjenkins pg_num 256 pgp_num 256 last_change 10945 flags
hashpspool crash_replay_interval 45 stripe_width 0

pool 7 'cephfs_metadata' replicated size 3 min_size 1 crush_ruleset 0
object_hash rjenkins pg_num 128 pgp_num 128 last_change 10943 flags
hashpspool stripe_width 0

max_osd 18

If a single OSD getting out/down, i expect the cluster to continue to work.

Because we have replicated everything 3 times.

But the virtual servers ( KVM ) some accessing via librbd some accessing
via cephfs getting cut of from their virtual harddisks.

Why is it that way ?

For my understanding, if 1 OSD is gone, and we replicate everything 3
times, and i assume that ceph is not as stupid as putting all 3 replicas
on the same OSD, how can it go down like that ?

Thank you !

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:info@xxxxxxxxxxxxxxxxx

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com