Hello Kevin, On Tue, Jan 10, 2017 at 4:21 PM, Kevin Olbrich <ko@xxxxxxx> wrote: > 5x Ceph node equipped with 32GB RAM, Intel i5, Intel DC P3700 NVMe journal, Is the "journal" used as a ZIL? > We experienced a lot of io blocks (X requests blocked > 32 sec) when a lot > of data is changed in cloned RBDs (disk imported via OpenStack Glance, > cloned during instance creation by Cinder). > If the disk was cloned some months ago and large software updates are > applied (a lot of small files) combined with a lot of syncs, we often had a > node hit suicide timeout. > Most likely this is a problem with op thread count, as it is easy to block > threads with RAIDZ2 (RAID6) if many small operations are written to disk > (again, COW is not optimal here). > When recovery took place (0.020% degraded) the cluster performance was very > bad - remote service VMs (Windows) were unusable. Recovery itself was using > 70 - 200 mb/s which was okay. I would think having an SSD ZIL here would make a very large difference. Probably a ZIL may have a much larger performance impact than an L2ARC device. [You may even partition it and have both but I'm not sure if that's normally recommended.] Thanks for your writeup! -- Patrick Donnelly _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com