Re: Usage pattern and design of Ceph

Martin Rudat <mrudat@xxxxxxxxxxxx> · Mon, 19 Aug 2013 18:51:05 +1000

Hi,

On 2013-08-19 15:48, Guang Yang wrote:
After walking through some documents of Ceph, I have a couple of 
questions:
  1. Is there any comparison between Ceph and AWS S3, in terms of the 
ability to handle different work-loads (from KB to GB), with 
corresponding performance report?
No idea; I've not seen one, but I haven't gone looking either. I think 
that I've seen mention of benchmarks, though.

  2. Looking at some industry solutions for distributed storage, GFS / 
Haystack / HDFS all use meta-server to store the logical-to-physical 
mapping within memory and avoid disk I/O lookup for file reading, is 
the concern valid for Ceph (in terms of latency to read file)?
I'd imagine that with enough memory on the host running the mds, it 
would be the equivalent to explicitly holding everything in memory, as 
there would be enough data buffered that there's nearly no disk i/o to 
do metadata lookup; you're going to have to have i/o for writing, but 
that's unavoidable if you want to maintain data integrity.

I haven't a clue if there's any kind of striping between multiple 
metadata servers, if you have more metadata in flight than can 
comfortably fit entirely in memory on a single host, and given you can 
cram 48G of ram into a machine with an intel CPU (at the current 
8G/dimm), without needing to go to a multiple-socket motherboard, it 
would take quite some effort to reach that state.

  3.. Some industry research shows that one issue of file system is 
the metadata-to-data ratio, in terms of both access and storage, and 
some technic uses the mechanism to combine small files to large 
physical files to reduce the ratio (Haystack for example), if we want 
to use ceph to store photos, should this be a concern as Ceph use one 
physical file per object?
What would the average object size be? The default size for a 
chunk/slice/...? in RBD is 4M (also the default extent size in LVM); I 
presume that it's not just a random number pulled out of the air, and 
there's at least some vague thought to balancing data/metadata ratio.

--
Martin Rudat

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com