Hi all, Following last weeks flamegraph runs, I wanted to find more details on exactly what was happening. While flamegraphs highlighted that a changed existed between single and multithreaded servers, it did not help to isolate where the change was occuring. To understand this I have started to work on a set of systemtap scripts that we can use to profile 389ds. These will be included in a future PR. Here are the hisograms from an initial test of profiling "do_search" 1 thread stap test_do_search.stp ^CDistribution of latencies (in nanoseconds) for 441148 samples max/avg/min: 35911542/85694/38927 value |-------------------------------------------------- count 8192 | 0 16384 | 0 32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 167285 65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 239280 131072 |@@@@@ 25788 262144 |@ 8530 524288 | 252 1048576 | 6 2097152 | 1 4194304 | 3 8388608 | 0 16777216 | 2 33554432 | 1 67108864 | 0 134217728 | 0 16 thread stap test_do_search.stp ^CDistribution of latencies (in nanoseconds) for 407806 samples max/avg/min: 100315928/112407/39368 value |-------------------------------------------------- count 8192 | 0 16384 | 0 32768 |@@@@@@@@@@@@@@@@@@@@ 100281 65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 249656 131072 |@@@@@@@ 37837 262144 |@@@ 18322 524288 | 1171 1048576 | 203 2097152 | 90 4194304 | 74 8388608 | 83 16777216 | 58 33554432 | 21 67108864 | 10 134217728 | 0 268435456 | 0 It's interesting to note the tail latency here: On the 16 thread version we see 67000 less in the 32768 buckets, shifting mostly through the 131072 and 262144 buckets, as well as showing a much greater number of calls in the tail. In thread 1 no operation made it to 67108864, but 10 did in 16thread, along with ~200 more that are higher than 1048567, and ~1500 more that are greater than 524288. This kind of tailing means we have "spikes" of latency throughout the execution, which then have a minor flow on cause other operations to be increased in latency. These are all in nanoseconds, so this means most operations are in do_search for 0.000131072 seconds or less, while there are spikes of operations taking between nearly 0.001048576 and 0.067108864 seconds to complete (which is an 8x to 512x increase in execution time. >From these point I took two measurements of back_ldbm and the access log, knowing these were easy targets for potential areas of latency. Both of these were performed with the 16thread server. I didn't need to do this on the 1 thread because I'm looking for tail latency, not comparisons now. Because we know that there is an increase of ~1500 events of high latency, we are looking for similar numbers to appear in other tail latencies. ^CDistribution of back_ldbm latencies (in nanoseconds) for 364223 samples max/avg/min: 4757744/31192/13622 value |-------------------------------------------------- count 2048 | 0 4096 | 0 8192 |@@ 14318 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 261265 32768 |@@@@@@@@@@@@ 63533 65536 |@@@ 20624 131072 | 4062 262144 | 304 524288 | 68 1048576 | 33 2097152 | 15 4194304 | 1 8388608 | 0 16777216 | 0 This shows some tail latency in back_ldbm however it's infrequent, only showing a handful of spikes, but they do exist and tend to taper off quickly as they approach 2097152. Certainly these should be looked at as they will likely add up to affecting do_search. stap -v test_access_log.stp ^CDistribution of slapi_log_access latencies (in nanoseconds) for 1284350 samples max/avg/min: 23843992/5138/1036 value |-------------------------------------------------- count 256 | 0 512 | 0 1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 379710 2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 361243 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 412480 8192 |@@@@@@@@@@ 83815 16384 |@@@@ 40087 32768 | 5076 65536 | 1253 131072 | 406 262144 | 130 524288 | 72 1048576 | 33 2097152 | 16 4194304 | 19 8388608 | 1 16777216 | 9 33554432 | 0 67108864 | 0 >From this, I can see that while some tail latency exists in some conditions of back_ldbm, there is a much longer and present tail in the access log, with nearly 2000 operations in or exceeding the 65536 mark - remember, in do_search most of the operations as a whole take 65536, so we have latency spikes in slapi_log_access that are as large or larger than whole search operations as a total. Now we can see "most! operations are fimly below 16384. This is what we want to see, but it's interesting that a slapi_log_access can "take up to" a quarter of a whole operations processing on the avg case. What's truly telling here is the tail down to 16777216, with ~250 operations exceeding 262144 ns. This shows that within slapi_log_access, we have have large spiked delays in the behaviour, which will be affecting both the tail latency of do_search as a whole, but also causing other slapi_log_access operations to "hold up". My next steps from here will be: * To add USDT probes to the logs and back_ldbm to get better, fine detail about what is causing the excessive latency. * these probes are also needed to resolve what appears to be a symbol resolution issue with systemtap when optimisations are enabled. * To add probes in other parts of the code base to get better visualisation about where and how long timings are taking through an operation. * To run a lock contention profile (I was unable to do this today due to bugs in systemtap) * To document the setup proceedures * Commit all these into a PR * Document some actionable improvements we can make to improve the server performance. Hope that is interesting to everyone, -- Sincerely, William _______________________________________________ 389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/389-devel@xxxxxxxxxxxxxxxxxxxxxxx