Re: anyone using CephFS for HPC?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 06/14/2015 06:53 PM, Nigel Williams wrote:
On 12/06/2015 3:41 PM, Gregory Farnum wrote:
...  and the test evaluation was on repurposed Lustre
hardware so it was a bit odd, ...

Agree, it was old (at least by now) DDN kit (SFA10K?) and not ideally
suited for Ceph (really high OSD per host ratio).

FWIW, I did most of the performance work on the Ceph side for that paper. Let me know if you are interested in any of the details. It was definitely not ideal, though in the end we did relatively well I think. Ultimately the lack of SSD journals hurt us as we hit the IB limit to the SFA10K long before we hit the disk limits, and we were topping out at about 6-8GB/s for sequential reads when we should have been able to hit 12GB/s. We have seen some cases where filestore doesn't do large reads as quickly as you'd think (newstore seems to do better).

The big things that took a lot of effort to figure out during this testing were:

- General strange
- cache mirroring on the SFA10k *really* hurting performance with Ceph (Not sure why it didn't hurt Lustre as badly) - Back around kernel 3.6 there were some nasty VM compaction issues that caused major performance problems. - Somewhat strange mdtest results. Probably just issues in the MDS back then.


Sage's thesis or some of the earlier papers will be happy to tell you
all the ways in which Ceph > Lustre, of course, since creating a
successor is how the project started. ;)
-Greg

Thanks Greg, yes those original documents have been well-thumbed; but I
was hoping someone had done a more recent comparison given the
significant improvements over the last couple of Ceph releases.

My superficial poking about in Lustre doesn't reveal to me anything
particularly compelling in the design or typical deployments that would
magically yield higher-performance than an equally well tuned Ceph
cluster. Blair Bethwaite commented that Lustre client-side write caching
might be more effective than CephFS at the moment.

I suspect the big things are:

- Lustre doesn't do asynchronous replication (relies on hardware raid)
- Lustre may have more tuning issues worked out.
- Lustre doesn't (last I checked) do full data journaling.

Frankly a well-tuned Lustre configuration is going to do pretty well for large sequential IO. That's pretty much it's bread and butter. At least historically it's not been great at small random IO, and most lustre setups use some kind of STONITH setup for node outage which is obviously not nearly as nice as Ceph is.




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux