Re: Usage pattern and design of Ceph

Gregory Farnum <greg@xxxxxxxxxxx> · Mon, 19 Aug 2013 18:51:08 -0700

On Monday, August 19, 2013, Guang Yang  wrote:

Thanks Greg.

Some comments inline...

On Sunday, August 18, 2013, Guang Yang  wrote:

Hi ceph-users,
This is Guang and I am pretty new to ceph, glad to meet you guys in the community!

After walking through some documents of Ceph, I have a couple of questions:

  1. Is there any comparison between Ceph and AWS S3, in terms of the ability to handle different work-loads (from KB to GB), with corresponding performance report?

Not really; any comparison would be highly biased depending on your Amazon ping and your Ceph cluster. We've got some internal benchmarks where Ceph looks good, but they're not anything we'd feel comfortable publishing.

 [Guang] Yeah, I mean the solely server side time regardless of the RTT impact over the comparison.

  2. Looking at some industry solutions for distributed storage, GFS / Haystack / HDFS all use meta-server to store the logical-to-physical mapping within memory and avoid disk I/O lookup for file reading, is the concern valid for Ceph (in terms of latency to read file)?

These are very different systems. Thanks to CRUSH, RADOS doesn't need to do any IO to find object locations; CephFS only does IO if the inode you request has fallen out of the MDS cache (not terribly likely in general). This shouldn't be an issue...
[Guang] " CephFS only does IO if the inode you request has fallen out of the MDS cache", my understanding is, if we use CephFS, we will need to interact with Rados twice, the first time to retrieve meta-data (file attribute, owner, etc.) and the second time to load data, and both times will need disk I/O in terms of inode and data. Is my understanding correct? The way some other storage system tried was to cache the file handle in memory, so that it can avoid the I/O to read inode in.

In the worst case this can happen with CephFS, yes. However, the client is not accessing metadata directly; it's going through the MetaData Server, which caches (lots of) metadata on its own, and the client can get leases as well (so it doesn't need to go to the MDS for each access, and can cache information on its own). The typical case is going to depend quite a lot on your scale.
That said, I'm not sure why you'd want to use CephFS for a small-object store when you could just use raw RADOS, and avoid all the posix overheads. Perhaps I've misunderstood your use case?
-Greg

  3. Some industry research shows that one issue of file system is the metadata-to-data ratio, in terms of both access and storage, and some technic uses the mechanism to combine small files to large physical files to reduce the ratio (Haystack for example), if we want to use ceph to store photos, should this be a concern as Ceph use one physical file per object?

...although this might be. The issue basically comes down to how many disk seeks are required to retrieve an item, and one way to reduce that number is to hack the filesystem by keeping a small number of very large files an calculating (or caching) where different objects are inside that file. Since Ceph is designed for MB-sized objects it doesn't go to these lengths to optimize that path like Haystack might (I'm not familiar with Haystack in particular).

That said, you need some pretty extreme latency requirements before this becomes an issue and if you're also looking at HDFS or S3 I can't imagine you're in that ballpark. You should be fine. :)

[Guang] Yep, that makes a lot sense.

-Greg

-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com

-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com