I am testing erasure code pools and doing a rados test write to try
fault tolerace. I have 3 Nodes with 1 OSD each, K=2 M=1. While performing the write (rados bench -p replicate 100 write), I stop one of the OSDs daemons (example osd.0), simulating a node fail, and then the hole write stops and I can't write any data anymore. 1 16 28 12 46.8121 48 1.01548 0.616034 2 16 40 24 47.3907 48 1.04219 0.923728 3 16 52 36 47.5889 48 0.593145 1.0038 4 16 68 52 51.6633 64 1.39638 1.08098 5 16 74 58 46.158 24 1.02699 1.10172 6 16 83 67 44.4711 36 3.01542 1.18012 7 16 95 79 44.9722 48 0.776493 1.24003 8 16 95 79 39.3681 0 - 1.24003 9 16 95 79 35.0061 0 - 1.24003 10 16 95 79 31.5144 0 - 1.24003 11 16 95 79 28.6561 0 - 1.24003 12 16 95 79 26.2732 0 - 1.24003 Its pretty clear where the OSD failed On the other hand, using a replicated pool, the client (rados test) doesnt even notice the OSD fail, which is awesome. Is this a normal behaviour on EC pools? Jorge Pinilla López jorpilo@xxxxxxxxx Estudiante de ingenieria informática Becario del area de sistemas (SICUZ) Universidad de Zaragoza PGP-KeyID: A34331932EBC715A |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com