Update on profiling

William Brown <william@xxxxxxxxxxxxxxxx> · Thu, 5 Dec 2019 15:49:39 +1000

After Ludwig's suggestion to seperate the machines, and after some heartstopping lvm issues last week, I've now established a load testing environment where I have a two VM's, one for ldclt and one for ds.

I'm currently testing only altering the "thread" parameter, since the initial tests showed steep changes in performance based on threads. I have run two search tests, with one DS instance set to a single thread, and the other to 16. The VM in use is a dual core ontop of an i5 2.9GHz, with 4GB of ram and 40GB of ssd storage.

I'm going to attach two svgs to this email for the two tests. 1837 is the single thread results, 1942 is the 16 thread result.

There are still some quirks with perf resolving symbol names, but the stacks otherwise look correct. I'll be submitting my tooling in a PR when I have completed some more work.

The idea of a flamegraph is to allow visual inspection to determine high latency or resource consuming areas of the codebase. These flamegraphs show at an early glance some initial changes. We should expect these changes are due to the presence of lock contention as a single thread instance should have NO lock contention, where a multi thread instance will show it.

connection_threadmain (our worker threads, with poll/read), shows a decrease in time usage. This makes sense since there are no locks in connection threadmain which means that our "delays" or latency is coming from other code paths.

We spend fractionally more time in libback. This is too be expected though, since there are extra threads and they can be in different stages of a search, so we expect to see slightly more concurrent threads in each frame.

We spent an increased time in acl_access_allowed 7.7% vs 5.9%. This would indicate that we may have some delays due to ACL processing, or other locking in that region.

There is also a minor increase in time spent in log_result (3.6% to 3.9%). It's already a bit concerning that we spend ~4% of our time in logging, when it's only I think 3 or 4 function calls in do_search out of hundreds. The change appears to be due to higher numbers of vsnprintf, likely due to slightly increased throughput of the multi thread version. 

I think my next step will be to update and look at a lock contention profile from system tap, as well as potentially adding timing markers to the code base and using system tap to record function entry/exit timings to determine where latency changes are occuring. It appears we have no single source of pain, as much as what I have suspected - lots of little locks, throughout the code base, that cause a small, yet noticeable progression delay. 

Thanks, 

note: you can click into the content of the .svg to "zoom" to specific function areas and see their percentages etc. 

--
Sincerely,

William

_______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-devel@xxxxxxxxxxxxxxxxxxxxxxx