Re: CBT: New RGW getput benchmark and testing diary

Mark Nelson <mnelson@xxxxxxxxxx> · Tue, 7 Feb 2017 08:47:49 -0600

Hi Orit,

This was a pull from master over the weekend:
5bf39156d8312d65ef77822fbede73fd9454591f

Btw, I've been noticing that it appears when bucket index sharding is 
used, there's a higher likelyhood that client connection attempts are 
delayed or starved out entirely under high concurrency.  I haven't 
looked at the code yet, does this match with what you'd expect to 
happen?  I assume the threadpool is shared?

Mark

On 02/07/2017 07:50 AM, Orit Wasserman wrote:
Mark,
On what version did you run the tests?

Orit

On Mon, Feb 6, 2017 at 7:07 PM, Mark Nelson <mnelson@xxxxxxxxxx> wrote:

On 02/06/2017 11:02 AM, Orit Wasserman wrote:

On Mon, Feb 6, 2017 at 5:44 PM, Matt Benjamin <mbenjamin@xxxxxxxxxx>
wrote:

Keep in mind, RGW does most of its request processing work in civetweb
threads, so high utilization there does not necessarily imply
civetweb-internal processing.

True but the request processing is not a CPU intensive operation.
It does seems to indicate that the civetweb threading model simply
doesn't scale (we already noticed it already) or maybe it can point to
some locking issue. We need to run a profiler to understand what is
consuming CPU.
It maybe a simple fix until we move to asynchronous frontend.
It worth investigating as the CPU usage mark is seeing  is really high.

The initial profiling I did definitely showed a lot of tcmalloc threading
activity, which diminshed after increasing threadcache.  This is quite
similar to what we saw in simplemessenger with low threadcache values,
though likely is less true with async messenger.  Sadly a profiler like perf
probably isn't going to help much with debugging lock contention.  grabbing
GDB stack traces might help, or lttng.

Mark,
How many concurrent request were handled?

Most of the tests had 128 concurrent IOs per radosgw daemon.  The max thread
count was increased to 512.  It was very obvious when exceeding the thread
count since some getput processes will end up stalling and doing their
writes after others, leading to bogus performance data.

Orit

Matt

----- Original Message -----

From: "Mark Nelson" <mnelson@xxxxxxxxxx>
To: "Matt Benjamin" <mbenjamin@xxxxxxxxxx>
Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>, cbt@xxxxxxxxxxxxxx, "Mark
Seger" <mjseger@xxxxxxxxx>, "Kyle Bader"
<kbader@xxxxxxxxxx>, "Karan Singh" <karan@xxxxxxxxxx>, "Brent Compton"
<bcompton@xxxxxxxxxx>
Sent: Monday, February 6, 2017 10:42:04 AM
Subject: Re: CBT: New RGW getput benchmark and testing diary

Just based on what I saw during these tests, it looks to me like a lot
more time was spent dealing with civetweb's threads than RGW.  I didn't
look too closely, but it may be worth looking at whether there's any low
hanging fruit in civetweb itself.

Mark

On 02/06/2017 09:33 AM, Matt Benjamin wrote:

Thanks for the detailed effort and analysis, Mark.

As we get closer to the L time-frame, it should become relevant to look
at
the relative boost::asio frontend rework i/o paths, which are the open
effort to reduce CPU overhead/revise threading model, in general.

Matt

----- Original Message -----

From: "Mark Nelson" <mnelson@xxxxxxxxxx>
To: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>, cbt@xxxxxxxxxxxxxx
Cc: "Mark Seger" <mjseger@xxxxxxxxx>, "Kyle Bader"
<kbader@xxxxxxxxxx>,
"Karan Singh" <karan@xxxxxxxxxx>, "Brent
Compton" <bcompton@xxxxxxxxxx>
Sent: Monday, February 6, 2017 12:55:20 AM
Subject: CBT: New RGW getput benchmark and testing diary

Hi All,

Over the weekend I took a stab at improving our ability to run RGW
performance tests in CBT.  Previously the only way to do this was to
use
the cosbench plugin, which required a fair amount of additional
setup and while quite powerful can be overkill in situations where you
want to rapidly iterate over tests looking for specific issues.  A
while
ago Mark Seger from HP told me he had created a swift benchmark called
"getput" that is written in python and is much more convenient to run
quickly in an automated fashion.  Normally getput is used in
conjunction
with gpsuite, a tool for coordinating benchmarking multiple getput
processes.  This is how you would likely use getput on a typical ceph
or
swift cluster, but since CBT builds the cluster and has it's own way
for
launching multiple benchmark processes, it uses getput directly.

--
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html