Re: CephFS with erasure coding, do I need a cache-pool?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On our NLSAS OSD nodes, there is 1x NVMe PCIe card for all the WALs and DBs (we accept that the risk of 1 card failing is low, and our failure domain is host anyway). Each OSD (16 per host) gets 2GB of WAL and 10GB of DB.


On our Flash (SSD but not NVMe) OSD nodes, there are 8 OSDs per node, and 2x NVMe PCIe cards for the WALs and DBs. Each OSD gets 4GB of WAL and 40GB of DB. 


On our upcoming NVMe OSD nodes, for obvious reason, we don't do any such special allocation. 😊


Cheers,

Linh



From: Oliver Schulz <oliver.schulz@xxxxxxxxxxxxxx>
Sent: Tuesday, 17 July 2018 11:39:26 PM
To: Linh Vu; ceph-users
Subject: Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool?
 
Dear Linh,

another question, if I may:

How do you handle Bluestore WAL and DB, and
how much SSD space do you allocate for them?


Cheers,

Oliver


On 17.07.2018 08:55, Linh Vu wrote:
> Hi Oliver,
>
>
> We have several CephFS on EC pool deployments, one been in production
> for a while, the others about to pending all the Bluestore+EC fixes in
> 12.2.7 😊
>
>
> Firstly as John and Greg have said, you don't need SSD cache pool at all.
>
>
> Secondly, regarding k/m, it depends on how many hosts or racks you have,
> and how many failures you want to tolerate.
>
>
> For our smallest pool with only 8 hosts in 4 different racks and 2
> different pairs of switches (note: we consider switch failure more
> common than rack cooling or power failure), we're using 4/2 with failure
> domain = host. We currently use this for SSD scratch storage for HPC.
>
>
> For one of our larger pools, with 24 hosts over 6 different racks and 6
> different pairs of switches, we're using 4:2 with failure domain = rack.
>
>
> For another pool with similar host count but not spread over so many
> pairs of switches, we're using 6:3 and failure domain = host.
>
>
> Also keep in mind that a higher value of k/m may give you more
> throughput but increase latency especially for small files, so it also
> depends on how important performance is and what kind of file size you
> store on your CephFS.
>
>
> Cheers,
>
> Linh
>
> ------------------------------------------------------------------------
> *From:* ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of
> Oliver Schulz <oliver.schulz@xxxxxxxxxxxxxx>
> *Sent:* Sunday, 15 July 2018 9:46:16 PM
> *To:* ceph-users
> *Subject:* [ceph-users] CephFS with erasure coding, do I need a cache-pool?
> Dear all,
>
> we're planning a new Ceph-Clusterm, with CephFS as the
> main workload, and would like to use erasure coding to
> use the disks more efficiently. Access pattern will
> probably be more read- than write-heavy, on average.
>
> I don't have any practical experience with erasure-
> coded pools so far.
>
> I'd be glad for any hints / recommendations regarding
> these questions:
>
> * Is an SSD cache pool recommended/necessary for
> CephFS on an erasure-coded HDD pool (using Ceph
> Luminous and BlueStore)?
>
> * What are good values for k/m for erasure coding in
> practice (assuming a cluster of about 300 OSDs), to
> make things robust and ease maintenance (ability to
> take a few nodes down)? Is k/m = 6/3 a good choice?
>
> * Will it be sufficient to have k+m racks, resp. failure
> domains?
>
>
> Cheers and thanks for any advice,
>
> Oliver
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux