Re: Profiling discussion

thierry bordaz <tbordaz@xxxxxxxxxx> · Thu, 25 Oct 2018 09:46:38 +0200

    On 10/11/2018 12:57 AM, William Brown
      wrote:

      On Wed, 2018-10-10 at 16:26 +0200, thierry bordaz wrote:

        Hi William,

Thanks for starting this discussion.
Your email raise several aspects (How, for whom,..) and I think a way
to 
start would be to write down what we want.
A need is from a given workload to determine where we are spending
time 
as a way to determine where to invest.
An other need is to collect metrics at operation level.

      Aren't these very similar? The time we invest is generally on improving
a plugin or a small part of an operation, to make the operation as a
whole faster.

    It could be the used tools that are similar but the difference is
    about expected results. For example I was just discussing with a
    user who reported:

    [24/Oct/2018:12:10:55.012908141 -0800] conn=2400
          op=1 MODRDN dn="<DN_one>" newsuperior="(null)"

          [24/Oct/2018:12:11:01.711604553 -0800] conn=2400 op=1 RESULT err=1 tag=109
          nentries=0 etime=6.1301230184
          csn=5bd0d1cf000000010000

        tried the same modrdn in my test environment, resulting in no
        error and no latency.

      [24/Oct/2018:12:14:03.665479821 -0800] conn=138
        op=1 MODRDN dn="<DN_one>" newsuperior="(null)"

        [24/Oct/2018:12:14:03.749121724 -0800] conn=138 op=1 RESULT err=0 tag=109
        nentries=0 etime=0.0083774655
        csn=5bd0d28b0000/00010000

    So here the expected result is not to improve performance but
    having a diagnostic method/tool to know what is going on in
    production compare to tests.

      So if we can report on an individual operation, we can write a tool
similar to log-conv.pl, but for performance metrics that displays
trends of operations that are not performaning well, then we can find
examples of operations and why. 

         From the how perspective, we can rely on external tools
(stap+scripts), 
or internal tool (like the plugin you described+scripts). Of course
we 
can also do some enhancements inside DS (like adding probes) to help 
external tools. I have no strong opinion if an approach is better
than 
the other but I think it also depends what you want to perform.

      I think that it would be great if the tools we use internal to the
team, were accessible outside to admins of ds. That way when we get
reports for performance concerns, we have a standardised way of looking
at this. It's going to mean our workflow is the same between internal
development and profiling, as for external reports, and it will force
us to have all the information we need in that one place.

I think as a coarse first metric internal event timings is probably
want we want first. After that we can continue to extend from there?

As for the how, perhaps we can put something on the Operation struct
for appending and logging events and turning those into metrics? 

As mentioned you could use stap too with defined points for tracing,
but that limits us to linux only? 

        best regards
thierry

On 10/08/2018 12:37 PM, William Brown wrote:

          Hi there,

In a ticket Thierry and I mentioned that we should have a quick
discussion about ideas for profiling and what we want it to look
like and what we need. I think it’s important we improve our
observation into the server so that we can target improvements
correctly,

I think we should know:

* Who is the target audience to run our profiling tools?
* What kind of information we do want?
* Potential solution for the above.

With those in mind I think that Thierry suggested STAP scripts.

* Target audience - developers (us) and some “highly experienced”
admins (STAP is not the easiest thing to run).
* Information - STAP would largely tell us timing and possibly
allows some variable/struct extraction. STAP does allow us to look
at connection info too a bit easier.

I would suggest an “event” struct, and logging service

At the start of an operation we create an event struct. As we enter
- exit a plugin we can append timing information, and the plugin
itself can add details (for example, backend could add idl
performance metrics or other). At the end of the operation, we log
the event struct as a json blob to our access log associated to the
conn/op.

* Target - anyone, it’s a log level. Really easy to enable (Think
mailing list or user support, can easily send us diagnostic logs)
* Information - we need a bit more work to structure the “event”
struct internally for profiling, but we’d get timings and possibly
internal variable data as well in the event.

I think these are two possible approaches. STAP is less invasive,
easier to start now, but harder to extend later. Logging is more
accessible to users/admins, easier to extend later, but more work
to add now.

What do we think?

—
Sincerely,

William

_______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-devel-leave@lists.fedoraproject
.org
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidel
ines
List Archives: https://lists.fedoraproject.org/archives/list/389-de
vel@xxxxxxxxxxxxxxxxxxxxxxx

        _______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-devel-leave@lists.fedoraproject.o
rg
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelin
es
List Archives: https://lists.fedoraproject.org/archives/list/389-deve
l@xxxxxxxxxxxxxxxxxxxxxxx

      _______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-devel@xxxxxxxxxxxxxxxxxxxxxxx

_______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-devel@xxxxxxxxxxxxxxxxxxxxxxx