Re: CephFS with erasure coding, do I need a cache-pool?

Oliver Schulz <oliver.schulz@xxxxxxxxxxxxxx> · Tue, 17 Jul 2018 21:39:26 +0800

Dear Linh,

another question, if I may:

How do you handle Bluestore WAL and DB, and
how much SSD space do you allocate for them?

Cheers,

Oliver

On 17.07.2018 08:55, Linh Vu wrote:
Hi Oliver,

We have several CephFS on EC pool deployments, one been in production 
for a while, the others about to pending all the Bluestore+EC fixes in 
12.2.7 😊

Firstly as John and Greg have said, you don't need SSD cache pool at all.

Secondly, regarding k/m, it depends on how many hosts or racks you have, 
and how many failures you want to tolerate.

For our smallest pool with only 8 hosts in 4 different racks and 2 
different pairs of switches (note: we consider switch failure more 
common than rack cooling or power failure), we're using 4/2 with failure 
domain = host. We currently use this for SSD scratch storage for HPC.

For one of our larger pools, with 24 hosts over 6 different racks and 6 
different pairs of switches, we're using 4:2 with failure domain = rack.

For another pool with similar host count but not spread over so many 
pairs of switches, we're using 6:3 and failure domain = host.

Also keep in mind that a higher value of k/m may give you more 
throughput but increase latency especially for small files, so it also 
depends on how important performance is and what kind of file size you 
store on your CephFS.

Cheers,

Linh

------------------------------------------------------------------------
*From:* ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of 
Oliver Schulz <oliver.schulz@xxxxxxxxxxxxxx>
*Sent:* Sunday, 15 July 2018 9:46:16 PM
*To:* ceph-users
*Subject:*  CephFS with erasure coding, do I need a cache-pool?
Dear all,

we're planning a new Ceph-Clusterm, with CephFS as the
main workload, and would like to use erasure coding to
use the disks more efficiently. Access pattern will
probably be more read- than write-heavy, on average.

I don't have any practical experience with erasure-
coded pools so far.

I'd be glad for any hints / recommendations regarding
these questions:

* Is an SSD cache pool recommended/necessary for
CephFS on an erasure-coded HDD pool (using Ceph
Luminous and BlueStore)?

* What are good values for k/m for erasure coding in
practice (assuming a cluster of about 300 OSDs), to
make things robust and ease maintenance (ability to
take a few nodes down)? Is k/m = 6/3 a good choice?

* Will it be sufficient to have k+m racks, resp. failure
domains?

Cheers and thanks for any advice,

Oliver
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com