On 03/07/2014 04:13 AM, David Bierce wrote:
Ello — I’ve been watching with great eagerness at the design and features of ceph especially compared to the current distributed file systems I use. One of the pains with VM work loads is when writes stall for more than a few seconds, virtual machines that think they are communicating with a real live block device generally error out their file systems, in the case of ext? they remount as read only, with file and operating systems the behaviors for that scenario is…erratic at best. It looks like the default write timeout for an OSD is 30 seconds. With the write consistency behavior that ceph has, does than mean a write could be stalled by the client for up to 30 seconds in the event of an OSD failing to write, for whatever reason? If that is the case, is there a way around such a long timeout in block device terms short of 1 second checks?
What timeout are you looking at? Since by default librados/librbd block for ever, so there shouldn't be a timeout.
I've had multiple VMs hang for hours at a time when I broke a Ceph cluster and after fixing it the VMs would start working again.
They only reported some "task blocked for more then 120 seconds" messages in their dmesg, but that's all.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com