Re: SSD-Cache Tier + RBD-Cache = Filesystem corruption?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



What release of Infernalis are you running?  When you encounter this error, is the partition table zeroed out or does it appear to be random corruption?  

-- 

Jason Dillaman 

----- Original Message -----
> From: "Udo Waechter" <root@xxxxxxxxx>
> To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
> Sent: Saturday, February 6, 2016 5:31:51 AM
> Subject:  SSD-Cache Tier + RBD-Cache = Filesystem corruption?
> 
> Hello,
> 
> I am experiencing totally weird filesystem corruptions with the
> following setup:
> 
> * Ceph infernalis on Debian8
> * 10 OSDs (5 hosts) with spinning disks
> * 4 OSDs (1 host, with SSDs)
> 
> The SSDs are new in my setup and I am trying to setup a Cache tier.
> 
> Now, with the spinning disks Ceph is running since about a year without
> any major issues. Replacing disks and all that went fine.
> 
> Ceph is used by rbd+libvirt+kvm with
> 
> rbd_cache = true
> rbd_cache_writethrough_until_flush = true
> rbd_cache_size = 128M
> rbd_cache_max_dirty = 96M
> 
> Also, in libvirt, I have
> 
> cachemode=writeback enabled.
> 
> So far so good.
> 
> Now, I've added the SSD-Cache tier to the picture with "cache-mode
> writeback"
> 
> The SSD-Machine also has "deadline" scheduler enabled.
> 
> Suddenly VMs start to corrupt their filesystems (all ext4) with "Journal
> failed".
> Trying to reboot the machines ends in "No bootable drive"
> Using parted and testdisk on the image mapped via rbd reveals that the
> partition table is gone.
> 
> testdisk finds the proper ones, e2fsck repairs the filesystem beyond
> usage afterwards.
> 
> This does not happen to all machines, It happens to those that actually
> do some or most fo the IO
> 
> elasticsearch, MariaDB+Galera, postgres, backup, GIT
> 
> So I thought, yesterday one of my ldap-servers died, and that one is not
> doing IO.
> 
> Could it be that rbd caching + qemu writeback cache + ceph cach tier
> writeback are not playing well together?
> 
> I've read through some older mails on the list, where people had similar
> problems and suspected somehting like that.
> 
> What are the proper/right settings for rdb/qemu/libvirt?
> 
> libvirt: cachemode=none (writeback?)
> rdb: cache_mode = none
> SSD-tier: cachemode: writeback
> 
> ?
> 
> Thanks for any help,
> udo.
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux