Re: Using CephFS in High Performance (and Throughput) Compute Use Cases

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Manuel,


I was the one that did Red Hat's IO500 CephFS submission.  Feel free to ask any questions you like.  Generally speaking I could achieve 3GB/s pretty easily per kernel client and up to about 8GB/s per client with libcephfs directly (up to the aggregate cluster limits assuming enough concurrency).  Metadata is trickier.  The fastest option is if you have files spread across directories that you can manually pin round-robin to MDSes, though you can do somewhat well with ephemeral pinning too as a more automatic option.  If you have lots of clients dealing with lots of files in a single directory, that's where you revert to dynamic subtree partitioning which tends to be quite a bit slower (though at least some of this is due to journaling overhead on the auth MDS).  That's especially true if you have a significant number of active/active MDS servers (say 10-20+).  We tended to consistently do very well with the "easy" IO500 tests and struggled more with the "hard" tests.  Otherwise most of the standard Ceph caveats apply.  Replication eats into write performance, scrub/deep scrub can impact performance, choosing the right NVMe drive with power less protection and low overhead is important, etc.


Probably the most important questions you should be asking yourself is how you intend to use the storage, what do you need out of it, and what you need to do to get there.  Ceph has a lot of advantages regarding replication, self-healing, and consistency and it's quite fast for some workloads given those advantages. There are some workloads though (say unaligned small writes from hundreds of clients to random files in a single directory) that potentially could be pretty slow.


Mark


On 7/21/21 8:54 AM, Manuel Holtgrewe wrote:
Dear all,

we are looking towards setting up an all-NVME CephFS instance in our
high-performance compute system. Does anyone have any experience to share
in a HPC setup or an NVME setup mounted by dozens of nodes or more?

I've followed the impressive work done at CERN on Youtube but otherwise
there appear to be only few places using CephFS this way. There are a few
of CephFS-as-enterprise-storage vendors that sporadically advertise CephFS
for HPC but it does not appear to be a strategic main target for them.

I'd be happy to read about your experience/opinion on CephFS for HPC.

Best wishes,
Manuel
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux