Re: Ceph Benchmark HowTo

Mark Nelson <mark.nelson@xxxxxxxxxxx> · Tue, 24 Jul 2012 10:55:37 -0500

On 07/24/2012 09:43 AM, Mehdi Abaakouk wrote:
Hi all,

I am currently doing some tests on Ceph, more precisely on the RDB and RADOSGW
parts.
My goal is to get some performance metrics according to the hardwares and
the Ceph setup.

To do so, I am preparing a benchmark how-to, to help people to compare their
metrics.

I have started the how-to here : http://ceph.com/w/index.php?title=Benchmark
I have linked it in the misc section of the main page.

Then, first question, is it alright if I continue publishing this procedure
on your wiki ?

The how-to is not finished yet, this is only a first draft.
My test platform is not ready yet too, so the result of the bench can't be
used yet.

The next work I will do on the how-to is to add some explanations on how
to interpret the results of benchmark.

So, if you have some comments, ideas of benchmarks, or anything that can be
helpful to me to improve the how-to and/or compare future results,
I would be glad to read them.

And thanks a lot for your work on Ceph, this is a great storage system :)

Best Regards,

Hi Mehdi,

Thanks for taking the time to put all of your benchmarking procedures 
into writing!  Having this kind of community participation is really 
important for a project like Ceph.  We use many of the same tools 
internally and personally I think it's fine to have it on the wiki.  I 
do want to stress that performance is going to be (hopefully!) improving 
over the next couple of months so we will probably want to have updated 
results (or at least remove old results!) as things improve.  Also, I'm 
not sure if we will be keeping the wiki around in it's current form. 
There was some talk about migrating to something else, but I don't 
really remember the details.

Some comments:

- 60s is a pretty short test.  You may get a more accurate 
representation of throughput by running longer tests.
- Performance degradation on aged filesystems can be an issue, so you 
may see different results if you run the test on a fresh filesystem vs 
one that has already had a lot of data written to it.
- Depending on the number of OSDs you have you may want to explicitly 
set the number of PGs when creating the benchmarking pool.
- We also have a tool called "test_filestore_workloadgen" which lets you 
directly test the filestore (data disk and journal) which can be useful 
when doing strace/perf/valgrind tests.

Also, We have some scripts in our ceph-tools repo that may also be 
useful for anyone who is interested in performance profiling or 
benchmarking.  Specifically:

analysis/log_analyzer.py - lets you analyze where high latency requests 
are spending their time if debugging/tracker options are turned on for 
the logs.

analysis/strace_parser.py - Rough tool to let you examine the frequency 
of various write/read/etc operations as reported by strace.  Useful for 
analyzing IO for things other than Ceph as well, but still in progress.

aging/runtests.py - A tool we use for running rados bench and rest bench 
internally on multiple clients.  Eventually this may be folded into our 
teuthology project as much of the functionality overlaps.  Requires 
pdsh, collectl, blktrace, perf, and possibly some other dependencies.

Thanks,
Mark
--
Mark Nelson
Performance Engineer
Inktank
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html