Hi, try to mount your filesystems with errors=continue option >From the mount (8) man page errors={continue|remount-ro|panic} Define the behaviour when an error is encountered. (Either ignore errors and just mark the filesystem erroneous and continue, or remount the filesystem read-only, or panic and halt the system.) The default is set in the filesystem superblock, and can be changed using tune2fs(8). ----- Mail original ----- De: "Paulo Almeida" <palmeida@xxxxxxxxxxxxxxxxx> À: ceph-users@xxxxxxxxxxxxxx Envoyé: Lundi 24 Novembre 2014 17:06:40 Objet: Virtual machines using RBD remount read-only on OSD slow requests Hi, I have a Ceph cluster with 4 disk servers, 14 OSDs and replica size of 3. A number of KVM virtual machines are using RBD as their only storage device. Whenever some OSDs (always on a single server) have slow requests, caused, I believe, by flaky hardware or, in one occasion, by a S.M.A.R.T command that crashed the system disk of one of the disk servers, most virtual machines remount their disk read-only and need to be rebooted. One of the virtual machines still has Debian 6 installed, and it never crashes. It also has an ext3 filesystem, contrary to some other machines, which have ext4. ext3 does crash in systems with Debian 7, but those have different mount flags, such as "barrier" and "data=ordered". I suspect (but haven't tested) that using "nobarrier" may solve the problem, but that doesn't seem to be an ideal solution. Most of those machines have Debian 7 or Ubuntu 12.04, but two of them have Ubuntu 14.04 (and thus a more recent kernel) and they also remount read-only. I searched the mailing list and found a couple of relevant messages. One person seemed to have the same problem[1], but someone else replied that it didn't happen in his case ("I've had multiple VMs hang for hours at a time when I broke a Ceph cluster and after fixing it the VMs would start working again"). The other message[2] is not very informative. Are other people experiencing this problem? Is there a file system or kernel version that is recommended for KVM guests that would prevent it? Or does this problem indicate that something else is wrong and should be fixed? I did configure all machines to use "cache=writeback", but never investigated whether that makes a difference or even whether it is actually working. Thanks, Paulo Almeida Instituto Gulbenkian de Ciência, Oeiras, Portugal [1] http://thread.gmane.org/gmane.comp.file-systems.ceph.user/8011 [2] http://thread.gmane.org/gmane.comp.file-systems.ceph.user/1742 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com