Re: Usage of CEPH FS versa HDFS for Hadoop: TeraSort benchmark performance comparison issue

Cameron Bahar <cbahar@xxxxxxxxx> · Thu, 13 Dec 2012 12:23:57 -0800

Is the chunk size tunable in A Ceph cluster. I don't mean dynamic, but even statically configurable when a cluster is first installed?

Thanks,
Cameron

Sent from my iPhone

On Dec 13, 2012, at 9:41 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:

> On Thu, Dec 13, 2012 at 9:27 AM, Sage Weil <sage@xxxxxxxxxxx> wrote:
>> Hi Jutta,
>> 
>> On Thu, 13 Dec 2012, Lachfeld, Jutta wrote:
>>> Hi all,
>>> 
>>> I am currently doing some comparisons between CEPH FS and HDFS as a file system for Hadoop using Hadoop's integrated benchmark TeraSort. This benchmark first generates the specified amount of data in the file system used by Hadoop, e.g. 1TB of data, and then sorts the data via the MapReduce framework of Hadoop, sending the sorted output again to the file system used by Hadoop.  The benchmark measures the elapsed time of a sort run.
>>> 
>>> I am wondering about my best result achieved with CEPH FS in comparison to the ones achieved with HDFS. With CEPH, the runtime of the benchmark is somewhat longer, the factor is about 1.2 when comparing with an HDFS run using the default HDFS block size of 64MB. When comparing with an HDFS run using an HDFS block size of 512MB the factor is even 1.5.
>>> 
>>> Could you please take a look at the configuration, perhaps some key factor already catches your eye, e.g. CEPH version.
>>> 
>>> OS: SLES 11 SP2
>>> 
>>> CEPH:
>>> OSDs are distributed over several machines.
>>> There is 1 MON and 1 MDS process on yet another machine.
>>> 
>>> Replication of the data pool is set to 1.
>>> Underlying file systems for data are btrfs.
>>> Mount options  are only "rw,noatime".
>>> For each CEPH OSD, we use a RAM disk of 256MB for the journal.
>>> Package ceph has version 0.48-13.1, package ceph-fuse has version 0.48-13.1.
>>> 
>>> HDFS:
>>> HDFS is distributed over the same machines.
>>> HDFS name node on yet another machine.
>>> 
>>> Replication level is set to 1.
>>> HDFS block size is set to  64MB or even 512MB.
>> 
>> I suspect that this is part of it.  The default ceph block size is only
>> 4MB.  Especially since the differential increases with larger blocks.
>> I'm not sure if the setting of block sizees is properly wired up; it
>> depends on what version of the hadoop bindings you are using.  Noah would
>> know more.
>> 
>> You can adjust the default block/object size for the fs with the cephfs
>> utility from a kernel mount.  There isn't yet a convenient way to do this
>> via ceph-fuse.
> 
> If Jutta is using the *old* ones I last worked on in 2009, then this
> is already wired up for 64MB blocks. A "ceph pg dump" would let us get
> a rough estimate of the block sizes in use.
> 
> "ceph -s" would also be useful to check that everything is set up reasonably.
> 
> Other than that, it would be fair to describe these bindings as
> little-used — minimal performance tests indicated rough parity back in
> 2009, but those were only a couple minutes long and on very small
> clusters, so 1.2x might be normal. Noah and Joe are working on new
> bindings now, and those will be tuned and accompany some backend
> changes if necessary. They might also have a better eye for typical
> results.
> -Greg
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html