I have a 4 node cluster shown by `ceph osd tree` below. Monitors are running on hosts 1, 2 and 3. It has a single replicated pool of size 3. I have a VM with its hard drive replicated to OSDs 11(host3), 5(host1) and 3(host2). I can 'fail' any one host by disabling the SAN network interface and the VM keeps running with a simple slowdown in I/O performance just as expected. However, if 'fail' both nodes 3 and 4, I/O hangs on the VM. (i.e. `df` never completes, etc.) The monitors on hosts 1 and 2 still have quorum, so that shouldn't be an issue. The placement group still has 2 of its 3 replicas online. Why does I/O hang even though host4 isn't running a monitor and doesn't have anything to do with my VM's hard drive. Size? # ceph osd pool get rbd size size: 3 Where's rbd_id.vm-100-disk-1? # ceph osd getmap -o /tmp/map && osdmaptool --pool 0 --test-map-object rbd_id.vm-100-disk-1 /tmp/map got osdmap epoch 1043 osdmaptool: osdmap file '/tmp/map' object 'rbd_id.vm-100-disk-1' -> 0.1ea -> [11,5,3] # ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 8.06160 root default -7 5.50308 room A -3 1.88754 host host1 4 0.40369 osd.4 up 1.00000 1.00000 5 0.40369 osd.5 up 1.00000 1.00000 6 0.54008 osd.6 up 1.00000 1.00000 7 0.54008 osd.7 up 1.00000 1.00000 -2 3.61554 host host2 0 0.90388 osd.0 up 1.00000 1.00000 1 0.90388 osd.1 up 1.00000 1.00000 2 0.90388 osd.2 up 1.00000 1.00000 3 0.90388 osd.3 up 1.00000 1.00000 -6 2.55852 room B -4 1.75114 host host3 8 0.40369 osd.8 up 1.00000 1.00000 9 0.40369 osd.9 up 1.00000 1.00000 10 0.40369 osd.10 up 1.00000 1.00000 11 0.54008 osd.11 up 1.00000 1.00000 -5 0.80737 host host4 12 0.40369 osd.12 up 1.00000 1.00000 13 0.40369 osd.13 up 1.00000 1.00000 -- Adam Carheden _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com