Re: Writing behavior in CEPH for VM using RBD

Wido den Hollander <wido@xxxxxxxx> · Fri, 07 Mar 2014 14:40:04 +0100

On 03/07/2014 04:13 AM, David Bierce wrote:
Ello —

I’ve been watching with great eagerness at the design and features of ceph especially compared to the current distributed file systems I use.  One of the pains with VM work loads is when writes stall for more than a few seconds, virtual machines that think they are communicating with a real live block device generally error out their file systems, in the case of ext? they remount as read only, with file and operating systems the behaviors for that scenario is…erratic at best.

It looks like the default write timeout for an OSD is 30 seconds.  With the write consistency behavior that ceph has, does than mean a write could be stalled by the client for up to 30 seconds in the event of an OSD failing to write, for whatever reason?  If that is the case, is there a way around such a long timeout in block device terms short of 1 second checks?

What timeout are you looking at? Since by default librados/librbd block 
for ever, so there shouldn't be a timeout.

I've had multiple VMs hang for hours at a time when I broke a Ceph 
cluster and after fixing it the VMs would start working again.

They only reported some "task blocked for more then 120 seconds" 
messages in their dmesg, but that's all.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com