Re: CephFS performance

Gregory Farnum <gfarnum@xxxxxxxxxx> · Tue, 22 Nov 2022 09:48:53 -0800

In addition to not having resiliency by default, my recollection is
that BeeGFS also doesn't guarantee metadata durability in the event of
a crash or hardware failure like CephFS does. There's not really a way
for us to catch up to their "in-memory metadata IOPS" with our
"on-disk metadata IOPS". :(

If that kind of cached performance is your main concern, CephFS is
probably not going to make you happy.

That said, if you've been happy using CephFS with hard drives and
gigabit ethernet, it will be much faster if you store the metadata on
SSD and can increase the size of the MDS cache in memory. More
specific tuning options than that would depend on your workload.
-Greg

On Tue, Nov 22, 2022 at 7:28 AM David C <dcsysengineer@xxxxxxxxx> wrote:
>
> My understanding is BeeGFS doesn't offer data redundancy by default,
> you have to configure mirroring. You've not said how your Ceph cluster
> is configured but my guess is you have the recommended 3x replication
> - I wouldn't be surprised if BeeGFS was much faster than Ceph in this
> case. I'd be interested to see your results after ensuring equivalent
> data redundancy between the platforms.
>
> On Thu, Oct 20, 2022 at 9:02 PM quaglio@xxxxxxxxxx <quaglio@xxxxxxxxxx> wrote:
> >
> > Hello everyone,
> >     I have some considerations and doubts to ask...
> >
> >     I work at an HPC center and my doubts stem from performance in this environment. All clusters here was suffering from NFS performance and also problems of a single point of failure it has. We were suffering from the performance of NFS and also the single point of failure it has.
> >
> >     At that time, we decided to evaluate some available SDS and the chosen one was Ceph (first for its resilience and later for its performance).
> >     I deployed CephFS in a small cluster: 6 nodes and 1 HDD per machine with 1Gpbs connection.
> >     The performance was as good as a large NFS we have on another cluster (spending much less). In addition, it was able to evaluate all the benefits of resiliency that Ceph offers (such as activating an OSD, MDS, MON or MGR server) and the objects/services to settle on other nodes. All this in a way that the user did not even notice.
> >
> >     Given this information, a new storage cluster was acquired last year with 6 machines and 22 disks (HDDs) per machine. The need was for the amount of available GBs. The amount of IOPs was not so important at that time.
> >
> >     Right at the beginning, I had a lot of work to optimize the performance in the cluster (the main deficiency was in the performance in the access/write of metadata). The problem was not at the job execution, but the user's perception of slowness when executing interactive commands (my perception was in the slowness of Ceph metadata).
> >     There were a few months of high loads in which storage was the bottleneck of the environment.
> >
> >     After a lot of research in documentation, I made several optimizations on the available parameters and currently CephFS is able to reach around 10k IOPS (using size=2).
> >
> >     Anyway, my boss asked for other solutions to be evaluated to verify the performance issue.
> >     First of all, it was suggested to put the metadata on SSD disks for a higher amount of IOPS.
> >     In addition, a test environment was set up and the solution that made the most difference in performance was with BeeGFS.
> >
> >     In some situations, BeeGFS is many times faster than Ceph in the same tests and under the same hardware conditions. This happens in both the throuput (BW) and IOPS.
> >
> >     We tested it using io500 as follows:
> >     1-) An individual process
> >     2-) 8 processes (4 processes on 2 different machines)
> >     3-) 16 processes (8 processes on 2 different machines)
> >
> >     I did tests configuring CephFS to use:
> >     * HDD only (for both data and metadata)
> >     * Metadata on SSD
> >     * Using Linux FSCache features
> >     * With some optimizations (increasing MDS memory, client memory, inflight parameters, etc)
> >     * Cache tier with SSD
> >
> >     Even so, the benchmark score was lower than the BeeGFS installed without any optimization. This difference becomes even more evident as the number of simultaneous accesses increases.
> >
> >     The two best results of CephFS were using metadata on SSD and also doing TIER on SSD.
> >
> >     Here is the result of Ceph's performance when compared to BeeGFS:
> >
> > Bandwith Test (bw is in GB/s):
> >
> > ==================================================
> > |fs        |bw        |process    |
> > ==================================================
> > |beegfs-metassd    |0.078933    |01        |
> > |beegfs-metassd    |0.051855    |08        |
> > |beegfs-metassd    |0.039459    |16        |
> > ==================================================
> > |cephmetassd    |0.022489    |01        |
> > |cephmetassd    |0.009789    |08        |
> > |cephmetassd    |0.002957    |16        |
> > ==================================================
> > |cephcache    |0.023966    |01        |
> > |cephcache    |0.021131    |08        |
> > |cephcache    |0.007782    |16        |
> > ==================================================
> >
> > IOPS Test:
> >
> > ==================================================
> > |fs        |iops        |process    |
> > ==================================================
> > |beegfs-metassd    |0.740658    |01        |
> > |beegfs-metassd    |3.508879    |08        |
> > |beegfs-metassd    |6.514768    |16        |
> > ==================================================
> > |cephmetassd    |1.224963    |01        |
> > |cephmetassd    |3.762794    |08        |
> > |cephmetassd    |3.188686    |16        |
> > ==================================================
> > |cephcache    |1.829107    |01        |
> > |cephcache    |3.257963    |08        |
> > |cephcache    |3.524081    |16        |
> > ==================================================
> >
> >     I imagine that if I test with 32 processes, BeeGFS is even better.
> >
> >     Do you have any recommendations for me to apply to Ceph without reducing resilience?
> >
> > Rafael._______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx