Hi, here's the promised evaluation of the current profiling tools and a proposal. Regards, Henrik TOOLS Valgrind - retained Oprofile - retained gprof, sprof - obsoleted by oprofile ltrace, ptrace - not capable of profiling dynamically loaded objects (dlopen-ed) GEGL instrumentation - same functionality is achieved by the below proposition. Further info: http://sites.google.com/site/computerresearcher/profiling-tools/ ################################################################################ PART ONE - VALGRIND - Where the time is spent ################################################################################ Current tools are capable of producing an abundance of profiling data. Callgrind (with Cachegrind activated) will produce 13 different measurements for every line of code that has been executed during the program run: 1 on how many times the line has been executed 8 on cache use 4 on branching Running callgrind on a command line gegl program that runs gaussian-blur produces profiling data for 149 different objects (89 of which are gegl operations that get loaded)!!! NOTE: A serious limitation of Valgrind is that it can only count events, it cannot tell you how much time things take (such as cache miss or execution time for an instruction). This is because it heavily modifies the code before running it, and therefore renders any time measurements useless. I propose to implement a tool that allows the user to (step 1) select the data he is interested in and (step 2) presents the results in a easy-to-understand way. Step 1 - SELECTION - user workflow: 1a) Select the libraries of interest 1b) Select the entry/exit function (normally the main function), i.e. only data measured inside this function (including calls to other functions) is displayed 1c) Hotpath elaboration. I.e. display and selection of the code execution path to display. (TBD) Step 2 - EVALUATION - workflow: 2a) Code annotation. I.e. display the above selected code with measurements 2b) Trend display Step 3 - MANAGEMENT - workflow: 3a) Adding new data (cmd line and web) 3b) Adding new evaluation scenarios (web) 3c) Listing data 3d) Deleting data 3e) Listing scenarios 3f) Deleting scenarios 3g) Adding scenarios ################################################################################ PART TWO - OPROFILE - What the processor is doing ################################################################################ An important limitation of the processor's performance counters is that from a choice of about 100, there are only 2-8, depending on processor model and mark, that can be used in the same moment. There has been some attempts by some researchers to multiplex them, but this requires modification of kernel code and thus is not accessible for the ordinary mortal developer. I propose to: a) By repeatedly running the test/performance case assemble all the important data into the database b) Possibly there will be a need to use statistics in order to have reliable data (i.e. determine which distribution and use this during data collection) TBD. c) Add groups of annotation data to the point 2a. Basically groups such as "L1 Cache use", "L2 Cache use", "Vector extensions" etc... Normally the data should be stored internally in the same format as for valgrind. This means that only an import tool needs to be written in order to add oprofile data. ################################################################################ IMPLEMENTATION DETAILS ################################################################################ I propose a web-based implementation using jRuby and Ruby on the Rails together with a sql database (could potentially be an object database = db4o). For the code annotation, I propose to use the existing subversion repository for retreiving the code to be annotated and GNU source highlight. _______________________________________________ Gegl-developer mailing list Gegl-developer@xxxxxxxxxxxxxxxxxxxxxx https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer