Hi Oliver, We put Cephfs directly on an 8:2 EC cluster, (10 nodes, 450 OSD), but put metadata on a replicated pool using NVMe drives (1 per node, 5 nodes). We get great performance with large files, but as Linh indicated, IOPS with small files could be better. I did consider adding a replicated SSD tier to improve IOPS. But having seen very inconsistent performance on a kraken test cluster that used tiering, deciding that it might not give a worthwhile speed-up, plus the added complexity could make the system fragile. I'd be interested to hear more from Greg about why cache pools are best avoided... best regards, Jake On 17/07/18 01:55, Linh Vu wrote: > Hi Oliver, > > > We have several CephFS on EC pool deployments, one been in production > for a while, the others about to pending all the Bluestore+EC fixes in > 12.2.7 😊 > > > Firstly as John and Greg have said, you don't need SSD cache pool at all. > > > Secondly, regarding k/m, it depends on how many hosts or racks you have, > and how many failures you want to tolerate. > > > For our smallest pool with only 8 hosts in 4 different racks and 2 > different pairs of switches (note: we consider switch failure more > common than rack cooling or power failure), we're using 4/2 with failure > domain = host. We currently use this for SSD scratch storage for HPC. > > > For one of our larger pools, with 24 hosts over 6 different racks and 6 > different pairs of switches, we're using 4:2 with failure domain = rack. > > > For another pool with similar host count but not spread over so many > pairs of switches, we're using 6:3 and failure domain = host. > > > Also keep in mind that a higher value of k/m may give you more > throughput but increase latency especially for small files, so it also > depends on how important performance is and what kind of file size you > store on your CephFS. > > > Cheers, > > Linh > > ------------------------------------------------------------------------ > *From:* ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of > Oliver Schulz <oliver.schulz@xxxxxxxxxxxxxx> > *Sent:* Sunday, 15 July 2018 9:46:16 PM > *To:* ceph-users > *Subject:* CephFS with erasure coding, do I need a cache-pool? > > Dear all, > > we're planning a new Ceph-Clusterm, with CephFS as the > main workload, and would like to use erasure coding to > use the disks more efficiently. Access pattern will > probably be more read- than write-heavy, on average. > > I don't have any practical experience with erasure- > coded pools so far. > > I'd be glad for any hints / recommendations regarding > these questions: > > * Is an SSD cache pool recommended/necessary for > CephFS on an erasure-coded HDD pool (using Ceph > Luminous and BlueStore)? > > * What are good values for k/m for erasure coding in > practice (assuming a cluster of about 300 OSDs), to > make things robust and ease maintenance (ability to > take a few nodes down)? Is k/m = 6/3 a good choice? > > * Will it be sufficient to have k+m racks, resp. failure > domains? > > > Cheers and thanks for any advice, > > Oliver > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com