Re: CBT: New RGW getput benchmark and testing diary

Orit Wasserman <owasserm@xxxxxxxxxx> · Tue, 7 Feb 2017 17:03:12 +0200



On Tue, Feb 7, 2017 at 4:47 PM, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
> Hi Orit,
>
> This was a pull from master over the weekend:
> 5bf39156d8312d65ef77822fbede73fd9454591f
>
> Btw, I've been noticing that it appears when bucket index sharding is used,
> there's a higher likelyhood that client connection attempts are delayed or
> starved out entirely under high concurrency.  I haven't looked at the code
> yet, does this match with what you'd expect to happen?  I assume the
> threadpool is shared?
>
yes it is shared.

> Mark
>
>
> On 02/07/2017 07:50 AM, Orit Wasserman wrote:
>>
>> Mark,
>> On what version did you run the tests?
>>
>> Orit
>>
>> On Mon, Feb 6, 2017 at 7:07 PM, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
>>>
>>>
>>>
>>> On 02/06/2017 11:02 AM, Orit Wasserman wrote:
>>>>
>>>>
>>>> On Mon, Feb 6, 2017 at 5:44 PM, Matt Benjamin <mbenjamin@xxxxxxxxxx>
>>>> wrote:
>>>>>
>>>>>
>>>>> Keep in mind, RGW does most of its request processing work in civetweb
>>>>> threads, so high utilization there does not necessarily imply
>>>>> civetweb-internal processing.
>>>>>
>>>>
>>>> True but the request processing is not a CPU intensive operation.
>>>> It does seems to indicate that the civetweb threading model simply
>>>> doesn't scale (we already noticed it already) or maybe it can point to
>>>> some locking issue. We need to run a profiler to understand what is
>>>> consuming CPU.
>>>> It maybe a simple fix until we move to asynchronous frontend.
>>>> It worth investigating as the CPU usage mark is seeing  is really high.
>>>
>>>
>>>
>>> The initial profiling I did definitely showed a lot of tcmalloc threading
>>> activity, which diminshed after increasing threadcache.  This is quite
>>> similar to what we saw in simplemessenger with low threadcache values,
>>> though likely is less true with async messenger.  Sadly a profiler like
>>> perf
>>> probably isn't going to help much with debugging lock contention.
>>> grabbing
>>> GDB stack traces might help, or lttng.
>>>
>>>>
>>>> Mark,
>>>> How many concurrent request were handled?
>>>
>>>
>>>
>>> Most of the tests had 128 concurrent IOs per radosgw daemon.  The max
>>> thread
>>> count was increased to 512.  It was very obvious when exceeding the
>>> thread
>>> count since some getput processes will end up stalling and doing their
>>> writes after others, leading to bogus performance data.
>>>
>>>
>>>>
>>>> Orit
>>>>
>>>>> Matt
>>>>>
>>>>> ----- Original Message -----
>>>>>>
>>>>>>
>>>>>> From: "Mark Nelson" <mnelson@xxxxxxxxxx>
>>>>>> To: "Matt Benjamin" <mbenjamin@xxxxxxxxxx>
>>>>>> Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>, cbt@xxxxxxxxxxxxxx,
>>>>>> "Mark
>>>>>> Seger" <mjseger@xxxxxxxxx>, "Kyle Bader"
>>>>>> <kbader@xxxxxxxxxx>, "Karan Singh" <karan@xxxxxxxxxx>, "Brent Compton"
>>>>>> <bcompton@xxxxxxxxxx>
>>>>>> Sent: Monday, February 6, 2017 10:42:04 AM
>>>>>> Subject: Re: CBT: New RGW getput benchmark and testing diary
>>>>>>
>>>>>> Just based on what I saw during these tests, it looks to me like a lot
>>>>>> more time was spent dealing with civetweb's threads than RGW.  I
>>>>>> didn't
>>>>>> look too closely, but it may be worth looking at whether there's any
>>>>>> low
>>>>>> hanging fruit in civetweb itself.
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>> On 02/06/2017 09:33 AM, Matt Benjamin wrote:
>>>>>>>
>>>>>>>
>>>>>>> Thanks for the detailed effort and analysis, Mark.
>>>>>>>
>>>>>>> As we get closer to the L time-frame, it should become relevant to
>>>>>>> look
>>>>>>> at
>>>>>>> the relative boost::asio frontend rework i/o paths, which are the
>>>>>>> open
>>>>>>> effort to reduce CPU overhead/revise threading model, in general.
>>>>>>>
>>>>>>> Matt
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>>>
>>>>>>>>
>>>>>>>> From: "Mark Nelson" <mnelson@xxxxxxxxxx>
>>>>>>>> To: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>, cbt@xxxxxxxxxxxxxx
>>>>>>>> Cc: "Mark Seger" <mjseger@xxxxxxxxx>, "Kyle Bader"
>>>>>>>> <kbader@xxxxxxxxxx>,
>>>>>>>> "Karan Singh" <karan@xxxxxxxxxx>, "Brent
>>>>>>>> Compton" <bcompton@xxxxxxxxxx>
>>>>>>>> Sent: Monday, February 6, 2017 12:55:20 AM
>>>>>>>> Subject: CBT: New RGW getput benchmark and testing diary
>>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> Over the weekend I took a stab at improving our ability to run RGW
>>>>>>>> performance tests in CBT.  Previously the only way to do this was to
>>>>>>>> use
>>>>>>>> the cosbench plugin, which required a fair amount of additional
>>>>>>>> setup and while quite powerful can be overkill in situations where
>>>>>>>> you
>>>>>>>> want to rapidly iterate over tests looking for specific issues.  A
>>>>>>>> while
>>>>>>>> ago Mark Seger from HP told me he had created a swift benchmark
>>>>>>>> called
>>>>>>>> "getput" that is written in python and is much more convenient to
>>>>>>>> run
>>>>>>>> quickly in an automated fashion.  Normally getput is used in
>>>>>>>> conjunction
>>>>>>>> with gpsuite, a tool for coordinating benchmarking multiple getput
>>>>>>>> processes.  This is how you would likely use getput on a typical
>>>>>>>> ceph
>>>>>>>> or
>>>>>>>> swift cluster, but since CBT builds the cluster and has it's own way
>>>>>>>> for
>>>>>>>> launching multiple benchmark processes, it uses getput directly.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Matt Benjamin
>>>>> Red Hat, Inc.
>>>>> 315 West Huron Street, Suite 140A
>>>>> Ann Arbor, Michigan 48103
>>>>>
>>>>> http://www.redhat.com/en/technologies/storage
>>>>>
>>>>> tel.  734-821-5101
>>>>> fax.  734-769-8938
>>>>> cel.  734-216-5309
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>>>> in
>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html