Hi, we have here the effect, that single OSD's are getting down/out because it happens that they are sometimes too slow. osd_pool_default_size = 3 osd_pool_default_min_size = 1 pool 0 'rbd' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 15391 flags hashpspool stripe_width 0 pool 6 'cephfs_data' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 10945 flags hashpspool crash_replay_interval 45 stripe_width 0 pool 7 'cephfs_metadata' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 10943 flags hashpspool stripe_width 0 max_osd 18 If a single OSD getting out/down, i expect the cluster to continue to work. Because we have replicated everything 3 times. But the virtual servers ( KVM ) some accessing via librbd some accessing via cephfs getting cut of from their virtual harddisks. Why is it that way ? For my understanding, if 1 OSD is gone, and we replicate everything 3 times, and i assume that ceph is not as stupid as putting all 3 replicas on the same OSD, how can it go down like that ? Thank you ! -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:info@xxxxxxxxxxxxxxxxx Anschrift: IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 63571 Gelnhausen HRB 93402 beim Amtsgericht Hanau Geschäftsführung: Oliver Dzombic Steuer Nr.: 35 236 3622 1 UST ID: DE274086107 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com