What release of Infernalis are you running? When you encounter this error, is the partition table zeroed out or does it appear to be random corruption? -- Jason Dillaman ----- Original Message ----- > From: "Udo Waechter" <root@xxxxxxxxx> > To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx> > Sent: Saturday, February 6, 2016 5:31:51 AM > Subject: SSD-Cache Tier + RBD-Cache = Filesystem corruption? > > Hello, > > I am experiencing totally weird filesystem corruptions with the > following setup: > > * Ceph infernalis on Debian8 > * 10 OSDs (5 hosts) with spinning disks > * 4 OSDs (1 host, with SSDs) > > The SSDs are new in my setup and I am trying to setup a Cache tier. > > Now, with the spinning disks Ceph is running since about a year without > any major issues. Replacing disks and all that went fine. > > Ceph is used by rbd+libvirt+kvm with > > rbd_cache = true > rbd_cache_writethrough_until_flush = true > rbd_cache_size = 128M > rbd_cache_max_dirty = 96M > > Also, in libvirt, I have > > cachemode=writeback enabled. > > So far so good. > > Now, I've added the SSD-Cache tier to the picture with "cache-mode > writeback" > > The SSD-Machine also has "deadline" scheduler enabled. > > Suddenly VMs start to corrupt their filesystems (all ext4) with "Journal > failed". > Trying to reboot the machines ends in "No bootable drive" > Using parted and testdisk on the image mapped via rbd reveals that the > partition table is gone. > > testdisk finds the proper ones, e2fsck repairs the filesystem beyond > usage afterwards. > > This does not happen to all machines, It happens to those that actually > do some or most fo the IO > > elasticsearch, MariaDB+Galera, postgres, backup, GIT > > So I thought, yesterday one of my ldap-servers died, and that one is not > doing IO. > > Could it be that rbd caching + qemu writeback cache + ceph cach tier > writeback are not playing well together? > > I've read through some older mails on the list, where people had similar > problems and suspected somehting like that. > > What are the proper/right settings for rdb/qemu/libvirt? > > libvirt: cachemode=none (writeback?) > rdb: cache_mode = none > SSD-tier: cachemode: writeback > > ? > > Thanks for any help, > udo. > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com