Hi Will > How well or poorly did the performance tools work in identifying the > performance problem? I think profiling CPU usage at the desktop level has two important properties: 1 A call graph is essential 2 The data don't have to be very accurate Ad 1: The desktop CPU problems are generally algorithmic in nature. The big improvements come from fixing O(n^2) algorithms and from adding caching and other high-level optimizations. To do this it is essential to know *why* something time-consuming is being done, so that you can in the best case change the algorithm to not actually do it anymore. Ad 2: Since you are working on high-level optimizations, you need to know stuff like "30% in metacity" and get a rough break-down of those 30%. The profiler must not be so intrusive that the applications become unusable, but slightly skewed data is not a disaster. This high-level optimization is in contrast to tuning of inner loops, where the properties are reversed: 1 In which function do we spend the time 2 What, exactly, is the CPU doing. You want to know about cache misses and divisions and branch predictions and such things. You want to know in what lines of source code the time is spent. In this case you generally don't try to stop doing it, you try to do it faster. The sysprof profiler, which can be checked out of GNOME cvs, is clearly aiming at the first kind of profiling. Sysprof works with a kernel module that 50 times per second generates a stacktrace of the process in the "current" variable, unless the pid of that process is 0. A userspace application then reads those stacktraces and presents the information graphically in lists and trees. So it is a statistical, sampling profiler. The kernel code probably reveals that I am not an experienced kernel hacker. Generally I worked from various driver writing guides I found on the net, and I consider it quite likely to break on more exotic kernels, where "exotic" means different from mine. Its killer feature I think is the presentation of the data. For each function you can get a complete break-down of the children in which that function spends its time. This even works with recursion, including mutual recursion. Generally it never reports a function as calling itself, instead it combines the numbers correctly. The not completely trivial details would make this mail much longer. That you can change the view of the data quickly makes it possible to get a good high-level overview of the performance characteristics of the system. A different property sysprof has is that it is fairly easy to get running. Just install a kernel module and start the application and you are set. I found oprofile a bit more difficult to get started with. It seems to me that since oprofile probably reports more and better data than my kernel module, we should try and get the graphical presentation from sysprof to present oprofile data. It shouldn't be too difficult to do this; the presentation code was lifted from the memprof/speedprof profiler and is quite independent of the rest of the profiler. (Actually you could argue that the presentation code pretty much _is_ the entire profiler). Another thing that might be nice is a library that would allow symbol lookup in binaries. I spent quite a bit of time whacking the memprof code to deal with prelinked binaries, and I am not too confident I got it completely right. Soeren