Re: messenger performance regression

Ricardo Dias <rdias@xxxxxxxx> · Mon, 29 Oct 2018 15:14:56 +0000

Hi all,

I've run some experiments to measure the throughput of the messenger
implementation, prior to and post to refactoring, using the
ceph_perf_msgr_client/server test application
(http://docs.ceph.com/docs/master/dev/messenger/). This test application
only tests the messenger code.

The commits I've tested were the following:

PRE:    fed335ff473fa49a21fc69889008425c207460ed
commit before the first refactoring of messenger V1

POST:   e6f043f7d2dc655629199231f0312ad6a6cf6be6
commit of the first refactoring of messenger V1

POST2:  bf12a6de2ba87b8f0e61770db58fe9f3b1794027
commit that fixed a problem in the refactoring of messenger V1 that
caused stack overflow by using continuations to implement the protocol
state machine

MASTER: f2db4bc8810166a0381a7c01c859d60f91d6a94d
the tip of the git history last friday.

For each of the above commits I ran the ceph_perf_msgr_* test
application on the loopback device.
The server was always launched with the following command:

./bin/ceph_perf_msgr_server 127.0.0.1:10001 1 0

The client was launched with the following command:

./bin/ceph_perf_msgr_client 127.0.0.1:10001 1 128 10000 0 $BS

Where $BS ranged over (4096 (4K), 1048576 (1M), 4194304 (4M))

For each configuration of (#commit, $BS) I ran 5 times the client
without restarting the server.

The output of each client run is the total time that it took to execute
10000 IOPs in microseconds.

I also collected results for the above configuration using the binaries
compiled with Release mode and with Debug mode.

Here are the results for the *Release mode* build:
(Each result is the average of the output of the 5 runs)

BS   PRE        POST	   POST2      MASTER
4K   1443746    1469349    1457197    1442658
1M   4325499    7466451    4355057    4352690
4M   13734228   33607299   13776834   13639610

In terms of slowdowns with respect to PRE we have:

BS   POST/PRE   POST2/PRE   MASTER/PRE
4K   1.02	1.01	    1.00
1M   1.73	1.01	    0.99
4M   2.45	1.00	    1.01

From the slowdowns table, we see that there is almost no slowdown
between the commit previous to the refactoring (PRE) and master.
The only significant slowdown we see is for the commit of the first
refactoring "POST", which was later fixed by the "POST2" commit.

Now the results for the *Debug mode* build:
(Each result is the average of the output of the 5 runs)

BS   PRE        POST       POST2      MASTER
4K   1980060    2144851    2022355    2051548
1M   4753528    7028510    4919022    4882254
4M   14270438   29091470   14419567   14442135

In terms of slowdowns with respect to PRE we have:

BS   POST/PRE   POST2/PRE   MASTER/PRE
4K   1.08       1.02        0.97
1M   1.48       1.03        0.97
4M   2.04       1.01        0.99

Once again from the table above we see that there is no slowdown even
when the binaries are built in "Debug mode". Actually the behavior of
Debug mode is very similar to Release mode.

In summary, from the results above, it looks like the performance
regression when running Ceph in "Debug mode" does not come from the
changes to the messenger code.
The slowdown should come from an upper layer in the application stack.

It might be worthwhile to check the ceph::mutex changes as suggested by
Sage.

cheers,
Ricardo

On 25/10/18 13:26, Sage Weil wrote:
> iHas anyone tested the commits before and after the merge of the protocol 
> refactor and confirmed that that is in fact the culprit?  The Mutex -> 
> ceph::mutex changes that have been trickling in might also be responsible 
> as those may have accidentally changed some Mutex(.., false for no 
> lockdep) to ceph::mutex (always lockdep), which would also affect the 
> debug builds only.
> 
> sage
> 
> 
> On Thu, 25 Oct 2018, Yan, Zheng wrote:
> 
>> On Thu, Oct 25, 2018 at 7:17 AM Matt Benjamin <mbenjami@xxxxxxxxxx> wrote:
>>>
>>> To clarify, what is the result with "-g -O2"?
>>>
>>
>> there is no slowdown in RelWithDebInfo mode
>>
>>
>>> Matt
>>>
>>> On Wed, Oct 24, 2018 at 5:59 PM, Ricardo Dias <rdias@xxxxxxxx> wrote:
>>>>
>>>>
>>>> On 24/10/2018 21:54, Gregory Farnum wrote:
>>>>> Do we understand why debug mode got so much slower? Is there something
>>>>> we can do to improve it?
>>>>
>>>> I believe the reason for the slowdown is due to the increase of number
>>>> of functions that are used in the new implementation. While in the
>>>> previous implementation, the state machine was implemented with just two
>>>> big functions (with a switch/case block in each), the new implementation
>>>> uses one function per protocol state.
>>>> I'm not familiar with what the compiler generates in Debug mode, but I
>>>> imagine that now there are much more debug symbols to track, and less
>>>> optimizations that the compiler can preform without confusing the
>>>> debugger tools.
>>>>
>>>> I currently don't see a way to improve the performance in Debug mode.
>>>> One thing we can do though is to also check the performance when
>>>> compiling in RelWithDebugInfo mode. If it preforms similar to the
>>>> Release mode, at least we still have debug symbols to help in
>>>> identifying some problems.
>>>>
>>>>>
>>>>> We are for instance seeing new issues with the messenger in our
>>>>> testing, apparently because the reduced speed opens up race conditions
>>>>> much wider. In this case that's good for us, but it could easily go
>>>>> the other way as well and I'm concerned about not finding new issues
>>>>> in our testing if the difference is so substantial compared to what
>>>>> will be deployed by users.
>>>>
>>>> Maybe we can build packages for the binaries compiled with the two modes
>>>> (Debug and Release) and be able to specify which one to use in each test
>>>> run.
>>>>
>>>>> -Greg
>>>>> On Wed, Oct 24, 2018 at 3:18 AM Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>>>>>
>>>>>> Only ceph complied in debug mode has the regression. Ceph complied in
>>>>>> release mode has no regression. Sorry for the noisy.
>>>>>>
>>>>>> Yan, Zheng
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 24, 2018 at 1:46 PM Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Yesterday I checked how fast ceph-mds can process requests (a client
>>>>>>> keeps sending getattr request of root inode). Requests rate I got is
>>>>>>> only about half of same test I did a few weeks ago. Perf profile of
>>>>>>> ceph-mds shows that messenger functions used more CPU time compared to
>>>>>>> mimic code. Performance result and perf profiles are at
>>>>>>> http://tracker.ceph.com/issues/36561.
>>>>>>>
>>>>>>> Regards
>>>>>>> Yan, Zheng
>>>>>
>>>>
>>>> --
>>>> Ricardo Dias
>>>> Senior Software Engineer - Storage Team
>>>> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
>>>> HRB 21284
>>>> (AG Nürnberg)
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Matt Benjamin
>>> Red Hat, Inc.
>>> 315 West Huron Street, Suite 140A
>>> Ann Arbor, Michigan 48103
>>>
>>> http://www.redhat.com/en/technologies/storage
>>>
>>> tel.  734-821-5101
>>> fax.  734-769-8938
>>> cel.  734-216-5309
>>

-- 
Ricardo Dias
Senior Software Engineer - Storage Team
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284
(AG Nürnberg)

Attachment:
signature.asc

Description: OpenPGP digital signature