Heya, I leave to the USA this Friday, and because of US tax laws I would have to pay an insane amount of tax for just 2 weeks of development. I have agreed with my mentor a while a go that I would finish up GSoC before I left, that way avoiding having to pay US taxes. (Over here in the Netherlands I don't have to pay tax over the $4500 because I didn't make >6000 euro this year, the joys of being a student.) As a result it is quite clear what GitStats will look like at the end of my GSoC. I am going to continue working on it though, I am especially interested in getting the '--follow' part of 'git log' working in such a way that it can be incorporated into GitStats. As such, here is a summary of what GitStat is at the moment. From the documentation: $ cat gitstats-* syntax: stats.py author <options> The purpose of the author module is to gather statistics about authors, and to aggregate this information to provide information about all the authors of a repo. Currently the available metrics in the author module are the following: * Determine how many changes an author made (making a distinction between lines added and lines deleted), and record this per author, for all files in the repository. It is possible to get this data for one specific author, (although data for all authors will still be gathered,) because the result is stored per author. * Determine how many commits an author made that affected a specific file. This metric is less granular, but a lot faster. To retreive the more granular result, one could simply iterate over the result of the above metric for each author, and take only the data for the file one is interested in. * Aggregate the information from the first metric, adding up the statistics from each author. This provides a more general 'file activity' metric and is a nice example of how an existing metric can be modified to do something seemingly unrelated. This module does not define any auxillery functions. syntax: stats.py branch <options> The purpose of the branch module is to gather statistics about branches, or related to branches. Currently the available metrics in the branch module are the following: * Which of the branch head does a commit belong to What this metric does is walk down the ancestry of the target commit and increase the 'dilution' by one or each merge it finds. The exception applies that when following the 'primary' parent of the merge (e.g., the branch recorded as the one the merge commit was made on), the dilution is not increased. As such, if the target commit is made on a branch, and then later on it's dilution is calculated, it will have 0 dilution for that branch. The branch with the lowest dilution is deemed to be the branch that the commit belongs to most. Note: In git there is no 'main' branch, as such any and all branches branches that 'branched off' after the target commit will also have dilution 0. It also defines the following auxillery functions: * Retreive the name of a commit if it is a the head of a branch. This can be seen as the reverse of 'rev-parse' for branch heads. This is used internally to provide the user with a sensible name when telling them which branch a commit belongs to. * List all the branches that contain a specific commit, optionally searching through remote branches as well as optionally not filtering at all. This is used internally to not search through branches that do not contain the target commit. syntax: stats.py bug <options> The purpose of the bug module is to gather statistics on bugfixes within the content, and to aggregate this information to provide with a report of the last N commits. Currently the available metrics in the bug module are the following: * Determine whether a specific commit is a bugfix based on other metrics. When one of the metrics is 'positive', that is, it's return value indicates that the examined commit is a bugfix, the 'bugfix rating' is increased by a pre-configured amount. This amount can be specified per metric, and can be set to '0' to ignore it. * Aggregate the above metric over the past N commits. Also, when running the above metric on more than one commit, cache the result of calls to the git binary so that the execution time is reduced. This means that the execution time is not directly proportional to the size of the repository. (Instead, there is a fixed 'start up' cost, after which there is a 'per commit' cost, which is relatively low.) This module does not define any auxillery functions. syntax: stats.py commit <options> The purpose of the commit module is to gather statistics about commits, or related to commits, with the exception of things related to diffs (only retrieving the raw diff is in this module). Currently the available metrics in the author module are the following: * Find all the commits that touched the same paths as the specified commit. This is implemented by passing the result of the 'paths touched' auxillary function to the 'commits that touched' auxillary function. See below. * Retrieve the diff of a commit, either with or without context, and optionally ignoring whitespace changes. This method also works for the root commit (by making use of the '--root' option to 'diff tree'. * Show only commits of which the commit message, and/or the commit diff, match a specific regexp. This is a simple reimplementation of 'git log -S' and 'git log --grep'. It is preferable to instead use the options native to 'git log', than to use this slower version. Note: the regexps for the 'commit message' and the 'commit diff' may be different. It also defines the following auxillary functions: * Print a commit in a 'readable' way, this is by default 'git log -1 --name-only'. By the way of an environmental variable ('GIT_STATS_PRETTY_PRINT'), it can be made to instead use 'git log -1 --prety=oneline' by setting the variable to 'online'. * Retrieve all the paths that a specific commit touches, this is used internally to limit the commits that have to be searched when looking for a merge. That is done by passing the output of this method to the one described below. * Find all commits that touched a list of files, optionally treating the paths as relative to the current working directory. syntax: stats.py diff <options> The purpose of the diff module is to gather statistics about diffs, or related to diffs. Currently the available metrics in the diff module are the following: * Determine whether two commit diffs are equal, optionally checking whether they are reverts instead. It is also possible to just look at what lines were changed (and ignore the actual changes). * Find all commits that are reverted by the specified commit by first retrieving the touched files, and then examining all the commits that that touch the same files. It also defines the following auxillery functions: * Parse a raw commit diff and store it on a hunk-by-hunk basis so that later on it can be examined more carefully by other tools. Line numbers are optionally included so that one can use those. For example, by comparing all the added hunks with the deleted hunks of a second commit, and vise versa, one can check for (partial) reverts. syntax: stats.py index <options> The purpose of the index module is to gather statistics about the index, or related to the index. Currently the available metrics in the index module are the following: * List all the commits that touch the same files as the staged files. This can be useful to find out which commit introduced the bug fixed in this commit (by piping the output of this method to one that looks at which lines were touched for example). It also defines the following auxillery functions: * Get all the staged changes (optionally ignoring newly added files). This is used internally to find all the commits that touched the same files as those that are currently staged. syntax: stats.py matcher <options> The purpose of the matcher module is to compare hunks within one diff to one another, and determine whether there is any code being moved around. Currently the available metrics in the index module are the following: * Try to find a match between the hunks in one diff, so that code moves can be detected. This makes use of the 'diff size calculation' described below. It also defines the following auxillery functions: * Calculate the size of a diff, only counting the amount of lines added, and the amount of lines deleted. This can be used to determine a best 'interdiff' (the shortest one is the best one), when searching for two hunks that are moved around. syntax: stats.py <subcommand> <arguments> Available commands: author Activity for one author, file, or project branch In how far a commit belongs to a branch bug Determine whether a commitis a bugfix commit Basic functionality already present in git diff Compare two diffs and find reverts index Find which commits touched the staged files matcher Try to match hunks in a diff to find moves test Run the unittests for GitStats The stats.py module is the main entry point of GitStats, it dispatches to the commands listed above. When no arguments are passed, it automatically runs the command with '--help' so that a usage message is shown for that command. Each of the modules it uses as subcommands defines a 'dispatch' function that is called with the users arguments (with the exception of the first, which is the name of the command executed). If anything should be returned to the system, the dispatch method should return this value. To run properly it requires the git_stats package to be a subdirectory of the directory it resides in. That is, your directory tree should be something like this: . |-- git_stats | `-- <listing of all installed modules> |-- scripts | `-- <listing of all installed scripts> |-- t | `-- <listing of all installed regression tests> `-- stats.py syntax: stats.py tests <options> The purpose of the tests module is to make available the unittests for GitStats to external programs. It may be used to run the unittests for GitStats from for example a shell script. It's output is made to match the git regression test suite. -- Cheers, Sverre Rabbelier -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html