Re: CephFS with erasure coding, do I need a cache-pool?

Jake Grimmett <jog@xxxxxxxxxxxxxxxxx> · Tue, 17 Jul 2018 11:40:21 +0100

Hi Oliver,

We put Cephfs directly on an 8:2 EC cluster, (10 nodes, 450 OSD), but
put metadata on a replicated pool using NVMe drives (1 per node, 5 nodes).

We get great performance with large files, but as Linh indicated, IOPS
with small files could be better.

I did consider adding a replicated SSD tier to improve IOPS. But having
seen very inconsistent performance on a kraken test cluster that used
tiering, deciding that it might not give a worthwhile speed-up, plus the
added complexity could make the system fragile.

I'd be interested to hear more from Greg about why cache pools are best
avoided...

best regards,

Jake

On 17/07/18 01:55, Linh Vu wrote:
> Hi Oliver,
> 
> 
> We have several CephFS on EC pool deployments, one been in production
> for a while, the others about to pending all the Bluestore+EC fixes in
> 12.2.7 😊
> 
> 
> Firstly as John and Greg have said, you don't need SSD cache pool at all. 
> 
> 
> Secondly, regarding k/m, it depends on how many hosts or racks you have,
> and how many failures you want to tolerate. 
> 
> 
> For our smallest pool with only 8 hosts in 4 different racks and 2
> different pairs of switches (note: we consider switch failure more
> common than rack cooling or power failure), we're using 4/2 with failure
> domain = host. We currently use this for SSD scratch storage for HPC.
> 
> 
> For one of our larger pools, with 24 hosts over 6 different racks and 6
> different pairs of switches, we're using 4:2 with failure domain = rack. 
> 
> 
> For another pool with similar host count but not spread over so many
> pairs of switches, we're using 6:3 and failure domain = host.
> 
> 
> Also keep in mind that a higher value of k/m may give you more
> throughput but increase latency especially for small files, so it also
> depends on how important performance is and what kind of file size you
> store on your CephFS. 
> 
> 
> Cheers,
> 
> Linh
> 
> ------------------------------------------------------------------------
> *From:* ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of
> Oliver Schulz <oliver.schulz@xxxxxxxxxxxxxx>
> *Sent:* Sunday, 15 July 2018 9:46:16 PM
> *To:* ceph-users
> *Subject:*  CephFS with erasure coding, do I need a cache-pool?
>  
> Dear all,
> 
> we're planning a new Ceph-Clusterm, with CephFS as the
> main workload, and would like to use erasure coding to
> use the disks more efficiently. Access pattern will
> probably be more read- than write-heavy, on average.
> 
> I don't have any practical experience with erasure-
> coded pools so far.
> 
> I'd be glad for any hints / recommendations regarding
> these questions:
> 
> * Is an SSD cache pool recommended/necessary for
> CephFS on an erasure-coded HDD pool (using Ceph
> Luminous and BlueStore)?
> 
> * What are good values for k/m for erasure coding in
> practice (assuming a cluster of about 300 OSDs), to
> make things robust and ease maintenance (ability to
> take a few nodes down)? Is k/m = 6/3 a good choice?
> 
> * Will it be sufficient to have k+m racks, resp. failure
> domains?
> 
> 
> Cheers and thanks for any advice,
> 
> Oliver
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com