Hi, colleagues!
I'm testing a simple Ceph cluster in order to use it in production environment. I have 8 OSDs (1Tb SATA drives) which are evenly distributed between 4 nodes.
I'v mapped rbd image on the client node and started writing a lot of data to it. Then I just reboot one node and see what's happening. What happens is very sad. I have a write freeze for about 20-30 seconds which is enough for ext4 filesystem to switch to RO.
I wonder, if there is any way to minimize this lag? AFAIK, ext filesystems have 5 seconds timeout before switching to RO. So is there any way to get that lag beyond 5 secs? I've tried lowering different osd timeouts, but it doesn't seem to help.
How do you deal with such a situations? 20 seconds of downtime is not tolerable in production.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com