Heya, With regard to Google Summer of Code's application deadline closing in fast, I could really appreciate it to get some feedback on my application so far. Especially on what parts of this idea would be appreciated the most, and what parts could be done without. I have been using git on several projects so far and am very happy with the way it does things. When looking at TortoiseSVN I noticed that it comes with a 'statistics' button that allows you to see which users have done what. Even though it is limited in that it can only show how many commits were made, I think this is an important feature to any VCS. I became aware of the importance of statistics during a project at my University (we had to use Subversion). During the project I noticed I used these statistics to talk about fair distribution of work, and it really helped to get everybody's nose pointing in the right direction. Keeping that in mind, I tried to get such statistics from git. Git provides a 'commits per user' feature under 'git shortlog -s -n -c master' (note the order of the switches). Consider Ohloh, an external tool that provides commit information about contributors to a project. It provides with a quick over of all contributors to a project, and what their contribution has been so far. At the moment git does not have anything similar, even though all the data needed for such an analysis is present. Integration with gitk and git-web would allow the data to be presented in a clear and informative way. Another bit of interesting information would be 'who is maintaining this code?'. Such information is especially useful when trying to decide whom to send a copy of a patch. Consider that git already contains the e-mail address of each developer that maintains a certain bit of code (this information is included in each commit). If we now find out who maintains the code that was changed in a commit git-format-patch could automatically include them in the cc: field. Similarly, one might be interested in what code a maintainer is currently working on. In a more broad sense it might be interesting to determine what part of the code is most actively worked on, and what part of the code is most stable. This is most interesting when deciding whether an API is ready to be published. (If the API is changing a lot it might be better to wait till it has stabilized.) This information could even be used to find 'edit wars'. (In which a part of the code is changed over and over again.) My plan for this summer is to create a 'statistics' feature for git. It would provide the following functionality: * Show how many commits a specific user made. * Show the (average) size of their changes (in lines for example). * Show a 'total diff', that is, take the difference between the source with, and without their changes, including its size (with for example a -c switch). * Show which contributors have contributed to the part of the code that a patch modifies. * Show what part of the code a maintainer is working on the most. * Define an output format for this information that can be used by other tools (such as gitk and git-web) * (Optional) Integrate all this information with gitk and git-web. Implementation would probably start out with python scripts since those are easy to modify and combine with other scripts. As milestones are reached in time, or ahead of time, attention could be shifted to converting these to C and combining them with the rest of git. When the other milestones are finished time could be spent on using the newly added features in gitk and/or git-web. To achieve all these milestones heavy usage can be made of existing git commands. For example, getting the total amount of commits from a maintainer can be achieved with the less-than-intuitive 'git shortlog -s -n -c master', providing an alias to this command would make it easier to use this functionality. Since other git commands will be used a lot, performance may suffer as a result of piping/parsing results from one command to another. When a feature is converted to C later on attention could be given to directly passing the result from one function to another. Determining which users have been active on a file git's built in 'blame' functionality can be used. Git blame is very fast it would be no problem to make extensive use of it in determining maintainer focus. In a similar way it can be used to determine who has worked on a file recently. I am a Dutch student, doing my Bachelor at 'Delft University of Technology'. I study 'Technische Informatica', Dutch for 'Computer Science'. Even before starting fourth grade in Highschool I learned C++ so that I could help out as a coder on a MUD (Multi User Dungeon). In grade four through six I followed the optional "Informatica" (a High school version of 'Computer Science') course. We learned Java and SQL, nothing too difficult, but it got me wanting to learn more. I learned to learn other languages on my own, probably valuable thing I learned. I have used git on numerous projects so far, although some of its more elaborate features I am not yet familiar with. My motivation for this particular idea I have described above. Enjoying working with git made me want to work on it as my Google of Summer project. Knowing that an original idea has more chance of being selected I spent a lot of time looking for ways to improve git worth a GSoC of coding. I'm really looking forward to coding for git and I think GSoC would be an awesome introduction to it's codebase but also to contributing to a large project. Thank you for your time and attention, Sverre Rabbelier (SRabbelier on #git) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html