Re: CBT: New RGW getput benchmark and testing diary

Orit Wasserman <owasserm@xxxxxxxxxx> · Tue, 7 Feb 2017 15:50:21 +0200

Mark,
On what version did you run the tests?

Orit

On Mon, Feb 6, 2017 at 7:07 PM, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
>
>
> On 02/06/2017 11:02 AM, Orit Wasserman wrote:
>>
>> On Mon, Feb 6, 2017 at 5:44 PM, Matt Benjamin <mbenjamin@xxxxxxxxxx>
>> wrote:
>>>
>>> Keep in mind, RGW does most of its request processing work in civetweb
>>> threads, so high utilization there does not necessarily imply
>>> civetweb-internal processing.
>>>
>>
>> True but the request processing is not a CPU intensive operation.
>> It does seems to indicate that the civetweb threading model simply
>> doesn't scale (we already noticed it already) or maybe it can point to
>> some locking issue. We need to run a profiler to understand what is
>> consuming CPU.
>> It maybe a simple fix until we move to asynchronous frontend.
>> It worth investigating as the CPU usage mark is seeing  is really high.
>
>
> The initial profiling I did definitely showed a lot of tcmalloc threading
> activity, which diminshed after increasing threadcache.  This is quite
> similar to what we saw in simplemessenger with low threadcache values,
> though likely is less true with async messenger.  Sadly a profiler like perf
> probably isn't going to help much with debugging lock contention.  grabbing
> GDB stack traces might help, or lttng.
>
>>
>> Mark,
>> How many concurrent request were handled?
>
>
> Most of the tests had 128 concurrent IOs per radosgw daemon.  The max thread
> count was increased to 512.  It was very obvious when exceeding the thread
> count since some getput processes will end up stalling and doing their
> writes after others, leading to bogus performance data.
>
>
>>
>> Orit
>>
>>> Matt
>>>
>>> ----- Original Message -----
>>>>
>>>> From: "Mark Nelson" <mnelson@xxxxxxxxxx>
>>>> To: "Matt Benjamin" <mbenjamin@xxxxxxxxxx>
>>>> Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>, cbt@xxxxxxxxxxxxxx, "Mark
>>>> Seger" <mjseger@xxxxxxxxx>, "Kyle Bader"
>>>> <kbader@xxxxxxxxxx>, "Karan Singh" <karan@xxxxxxxxxx>, "Brent Compton"
>>>> <bcompton@xxxxxxxxxx>
>>>> Sent: Monday, February 6, 2017 10:42:04 AM
>>>> Subject: Re: CBT: New RGW getput benchmark and testing diary
>>>>
>>>> Just based on what I saw during these tests, it looks to me like a lot
>>>> more time was spent dealing with civetweb's threads than RGW.  I didn't
>>>> look too closely, but it may be worth looking at whether there's any low
>>>> hanging fruit in civetweb itself.
>>>>
>>>> Mark
>>>>
>>>> On 02/06/2017 09:33 AM, Matt Benjamin wrote:
>>>>>
>>>>> Thanks for the detailed effort and analysis, Mark.
>>>>>
>>>>> As we get closer to the L time-frame, it should become relevant to look
>>>>> at
>>>>> the relative boost::asio frontend rework i/o paths, which are the open
>>>>> effort to reduce CPU overhead/revise threading model, in general.
>>>>>
>>>>> Matt
>>>>>
>>>>> ----- Original Message -----
>>>>>>
>>>>>> From: "Mark Nelson" <mnelson@xxxxxxxxxx>
>>>>>> To: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>, cbt@xxxxxxxxxxxxxx
>>>>>> Cc: "Mark Seger" <mjseger@xxxxxxxxx>, "Kyle Bader"
>>>>>> <kbader@xxxxxxxxxx>,
>>>>>> "Karan Singh" <karan@xxxxxxxxxx>, "Brent
>>>>>> Compton" <bcompton@xxxxxxxxxx>
>>>>>> Sent: Monday, February 6, 2017 12:55:20 AM
>>>>>> Subject: CBT: New RGW getput benchmark and testing diary
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> Over the weekend I took a stab at improving our ability to run RGW
>>>>>> performance tests in CBT.  Previously the only way to do this was to
>>>>>> use
>>>>>> the cosbench plugin, which required a fair amount of additional
>>>>>> setup and while quite powerful can be overkill in situations where you
>>>>>> want to rapidly iterate over tests looking for specific issues.  A
>>>>>> while
>>>>>> ago Mark Seger from HP told me he had created a swift benchmark called
>>>>>> "getput" that is written in python and is much more convenient to run
>>>>>> quickly in an automated fashion.  Normally getput is used in
>>>>>> conjunction
>>>>>> with gpsuite, a tool for coordinating benchmarking multiple getput
>>>>>> processes.  This is how you would likely use getput on a typical ceph
>>>>>> or
>>>>>> swift cluster, but since CBT builds the cluster and has it's own way
>>>>>> for
>>>>>> launching multiple benchmark processes, it uses getput directly.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Matt Benjamin
>>> Red Hat, Inc.
>>> 315 West Huron Street, Suite 140A
>>> Ann Arbor, Michigan 48103
>>>
>>> http://www.redhat.com/en/technologies/storage
>>>
>>> tel.  734-821-5101
>>> fax.  734-769-8938
>>> cel.  734-216-5309
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html