I think the P4600 should be fine, although 2TB is probably way over kill for 15 OSDs.
Our older nodes use the P3700 400GB for 16 OSDs. I have yet to see the WAL and DB getting filled up at 2GB/10GB each. Our newer nodes use the Intel Optane 900P 480GB, that's actually faster than the P4600 and significantly cheaper in our country (we bought ~100 OSD nodes recently and that was a big saving) and has a big 10 DWPD. For NLSAS OSDs, even the older P3700 is more than enough, but for our flash OSDs, the Optane 900P performs a lot better. It's about 2x faster than the P3700 we had, and allow us to get more out of our flash drives. From: Oliver Schulz <oliver.schulz@xxxxxxxxxxxxxx>
Sent: Wednesday, 18 July 2018 12:00:14 PM To: Linh Vu; ceph-users Subject: Re: [ceph-users] CephFS with erasure coding, do I need a cache-pool? Thanks, Linh!
A question regarding choice of NVMe - do you think an Intel P4510 or P4600 would do well for WAL+DB? I'd thinking about using a single 2 TB NVMe for 15 OSDs. Would you recommend a different model? Is there any experience on how many 4k IOPS one should have for WAL+DB per OSD? We have a few new BlueStore nodes in an older cluster, and we use Intel Optanes for WAL. We wanted to use them for DB too - only to learn that while fast they're just to small for the DB for several OSDs ... so I hope a "regular" NVMe is fast enough? We currently use the Gigabyte D120-C21 server barebone (https://b2b.gigabyte.com/Storage-Server/D120-C21-rev-100) for our OSD nodes, and we'd like to use it in our next cluster too, because of the high storage density and the good hdd-price to server-price ratio. But it can only fit a single NVMe-drive (we use one of the 16 HDD slots for an U.2 drive and connect it to the single M.2-PCIe slot on the mainboard). Cheers, Oliver On 18.07.2018 09:11, Linh Vu wrote: > On our NLSAS OSD nodes, there is 1x NVMe PCIe card for all the WALs and > DBs (we accept that the risk of 1 card failing is low, and our failure > domain is host anyway). Each OSD (16 per host) gets 2GB of WAL and 10GB > of DB. > > > On our Flash (SSD but not NVMe) OSD nodes, there are 8 OSDs per node, > and 2x NVMe PCIe cards for the WALs and DBs. Each OSD gets 4GB of WAL > and 40GB of DB. > > > On our upcoming NVMe OSD nodes, for obvious reason, we don't do any such > special allocation. 😊 > > > Cheers, > > Linh > > > ------------------------------------------------------------------------ > *From:* Oliver Schulz <oliver.schulz@xxxxxxxxxxxxxx> > *Sent:* Tuesday, 17 July 2018 11:39:26 PM > *To:* Linh Vu; ceph-users > *Subject:* Re: [ceph-users] CephFS with erasure coding, do I need a > cache-pool? > Dear Linh, > > another question, if I may: > > How do you handle Bluestore WAL and DB, and > how much SSD space do you allocate for them? > > > Cheers, > > Oliver > > > On 17.07.2018 08:55, Linh Vu wrote: > > Hi Oliver, > > > > > > We have several CephFS on EC pool deployments, one been in production > > for a while, the others about to pending all the Bluestore+EC fixes in > > 12.2.7 😊 > > > > > > Firstly as John and Greg have said, you don't need SSD cache pool at all. > > > > > > Secondly, regarding k/m, it depends on how many hosts or racks you have, > > and how many failures you want to tolerate. > > > > > > For our smallest pool with only 8 hosts in 4 different racks and 2 > > different pairs of switches (note: we consider switch failure more > > common than rack cooling or power failure), we're using 4/2 with failure > > domain = host. We currently use this for SSD scratch storage for HPC. > > > > > > For one of our larger pools, with 24 hosts over 6 different racks and 6 > > different pairs of switches, we're using 4:2 with failure domain = rack. > > > > > > For another pool with similar host count but not spread over so many > > pairs of switches, we're using 6:3 and failure domain = host. > > > > > > Also keep in mind that a higher value of k/m may give you more > > throughput but increase latency especially for small files, so it also > > depends on how important performance is and what kind of file size you > > store on your CephFS. > > > > > > Cheers, > > > > Linh > > > > ------------------------------------------------------------------------ > > *From:* ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of > > Oliver Schulz <oliver.schulz@xxxxxxxxxxxxxx> > > *Sent:* Sunday, 15 July 2018 9:46:16 PM > > *To:* ceph-users > > *Subject:* [ceph-users] CephFS with erasure coding, do I need a > cache-pool? > > Dear all, > > > > we're planning a new Ceph-Clusterm, with CephFS as the > > main workload, and would like to use erasure coding to > > use the disks more efficiently. Access pattern will > > probably be more read- than write-heavy, on average. > > > > I don't have any practical experience with erasure- > > coded pools so far. > > > > I'd be glad for any hints / recommendations regarding > > these questions: > > > > * Is an SSD cache pool recommended/necessary for > > CephFS on an erasure-coded HDD pool (using Ceph > > Luminous and BlueStore)? > > > > * What are good values for k/m for erasure coding in > > practice (assuming a cluster of about 300 OSDs), to > > make things robust and ease maintenance (ability to > > take a few nodes down)? Is k/m = 6/3 a good choice? > > > > * Will it be sufficient to have k+m racks, resp. failure > > domains? > > > > > > Cheers and thanks for any advice, > > > > Oliver > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com