Re: Profiling discussion

William Brown <william@xxxxxxxxxxxxxxxx> · Fri, 26 Oct 2018 10:25:01 +1000

> On 25 Oct 2018, at 17:46, thierry bordaz <tbordaz@xxxxxxxxxx> wrote:
> 
> 
> 
> On 10/11/2018 12:57 AM, William Brown wrote:
>> On Wed, 2018-10-10 at 16:26 +0200, thierry bordaz wrote:
>> 
>>> Hi William,
>>> 
>>> Thanks for starting this discussion.
>>> Your email raise several aspects (How, for whom,..) and I think a way
>>> to 
>>> start would be to write down what we want.
>>> A need is from a given workload to determine where we are spending
>>> time 
>>> as a way to determine where to invest.
>>> An other need is to collect metrics at operation level.
>>> 
>> Aren't these very similar? The time we invest is generally on improving
>> a plugin or a small part of an operation, to make the operation as a
>> whole faster.
>> 
> 
> It could be the used tools that are similar but the difference is about expected results. For example I was just discussing with a user who reported:
> [24/Oct/2018:12:10:55.012908141 -0800] conn=2400 op=1 MODRDN dn="<DN_one>" newsuperior="(null)"
> [24/Oct/2018:12:11:01.711604553 -0800] conn=2400 op=1 RESULT err=1 tag=109 nentries=0 etime=6.1301230184 csn=5bd0d1cf000000010000
> 
> tried the same modrdn in my test environment, resulting in no error and no latency.
> 
> [24/Oct/2018:12:14:03.665479821 -0800] conn=138 op=1 MODRDN dn="<DN_one>" newsuperior="(null)"
> [24/Oct/2018:12:14:03.749121724 -0800] conn=138 op=1 RESULT err=0 tag=109 nentries=0 etime=0.0083774655 csn=5bd0d28b0000/00010000
> 
> So here the expected result is not to improve performance but having a diagnostic method/tool to know what is going on in production compare to tests.

Yes. This is a perfect example of why we should provide logging, not stap scripts. The user can then enable an access log level that says something like:

MODRDN dn= result …. 
RESULT err=1 tag ….
PROFILE {
    start_op: <time>
    events: [
        startaci: <time>
        aci_log: ….
        endaci: <time>,
        start pre plugins: <time>
        start memberof: <time>
        end memberof: <time>
        end plugin: <time>
        start backend: <time>
        end backend: <time>
    ]
    end_op: <time>
}

We would know exactly what’s wrong with that operation. 

Probably a good idea to stick in access log timings too since I suspect our log subsystem is the problem. But this way admins can send us profiling reports of the server, with everything we need to diagnose the issue for each operation. Structuring this log with json means we can write tools to parse it, etc. 

—
Sincerely,

William

_______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-devel@xxxxxxxxxxxxxxxxxxxxxxx