Write freeze when writing to rbd image and rebooting one of the nodes

Vasiliy Angapov <angapov@xxxxxxxxx> · Wed, 13 May 2015 10:39:42 +0400

Hi, colleagues!
I'm testing a simple Ceph cluster in order to use it in production environment. I have 8 OSDs (1Tb SATA  drives) which are evenly distributed between 4 nodes. 

I'v mapped rbd image on the client node and started writing a lot of data to it. Then I just reboot one node and see what's happening. What happens is very sad. I have a write freeze for about 20-30 seconds which is enough for ext4 filesystem to switch to RO. 

I wonder, if there is any way to minimize this lag? AFAIK, ext filesystems have 5 seconds timeout before switching to RO. So is there any way to get that lag beyond 5 secs? I've tried lowering different osd timeouts, but it doesn't seem to help.

How do you deal with such a situations? 20 seconds of downtime is not tolerable in production.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com