Re: Usage of CEPH FS versa HDFS for Hadoop: TeraSort benchmark performance comparison issue

Gregory Farnum <greg@xxxxxxxxxxx> · Fri, 4 Jan 2013 16:17:14 -0800

Sorry for the delay; I've been out on vacation...

On Fri, Dec 14, 2012 at 6:09 AM, Lachfeld, Jutta
<jutta.lachfeld@xxxxxxxxxxxxxx> wrote:
> I do not have the full output of "ceph pg dump" for that specific TeraSort run, but here is a typical output after automatically preparing CEPH for a benchmark run
>  (removed almost all lines in the long pg_stat table hoping that you do not need them):

Actually those were exactly what I was after; they include output on
the total PG size and the number of objects so we can check on average
size. :) If you'd like to do it yourself, look at some of the PGs
which correspond to your data pool (the PG ids are all of the form
0.123a, and the number before the decimal point is the pool ID; by
default you'll be looking for 0).

On Fri, Dec 14, 2012 at 6:53 AM, Mark Nelson <mark.nelson@xxxxxxxxxxx> wrote:
> The large block size may be an issue (at least with some of our default
> tunable settings).  You might want to try 4 or 16MB and see if it's any
> better or worse.

Unless you've got a specific reason to think this is busted, I am
pretty confident it's not a problem. :)

Jutta, do you have any finer-grained numbers than total run time
(specifically, how much time is spent on data generation versus the
read-and-sort for each FS)? HDFS doesn't do any journaling like Ceph
does and the fact that the Ceph journal is in-memory might not be
helping much since it's so small compared to the amount of data being
written.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html