On Wed, Apr 16, 2008 at 10:55:03PM +0300, Adrian Bunk wrote: > On Wed, Apr 16, 2008 at 12:02:47PM -0700, Andrew Morton wrote: > > On Wed, 16 Apr 2008 16:26:34 +0300 > > Adrian Bunk <bunk@xxxxxxxxxx> wrote: > > > > > On Wed, Apr 16, 2008 at 02:15:22PM +0200, Sverre Rabbelier wrote: > > > > I'm not subscribed to the kernel mailing list, so please include me in > > > > the cc if you don't reply to the git list (which I am subscribed to). > > > > > > > > Git is participating in Google Summer of Code this year and I've > > > > proposed to write a 'git statistics' command. This command would allow > > > > the user to gather data about a repository, ranging from "how active > > > > is dev x" to "what did x work on in the last 3 weeks". It's main > > > > feature however, would be an algorithm that ranks commits as being > > > > either 'buggy', 'bugfix' or 'enhancement'. (There are several clues > > > > that can aid in determining this, a commit msg along the lines of > > > > "fixes ..." being the most obvious.) > > > >... > > > > Sounds like an interesting project. > > > > > At least with the data we have currently in git it's impossible to > > > figure that out automatically. > > > > > > E.g. if you look at commit f743d04dcfbeda7439b78802d35305781999aa11 > > > (ide/legacy/q40ide.c: add MODULE_LICENSE), how could you determine > > > automatically that it is a bugfix, and the commit that introduced > > > the bug? > > > > > > You can always get some data, but if you want to get usable statistics > > > you need explicit tags in the commits, not some algorithm that tries > > > to guess. > > > > Well yes. One outcome of the project would be to tell us what changes we'd > > need to make to our processes to make such data gathering more effective. > > > > Of course, we may not actually implement such changes. That would depend > > upon how useful the output is to us. > > That you can add this information through tags is clear, but according > to his SoC application that's not what he wants to do. > > According to his application he wants to determine automatically whether > a commit was a fix or whether a commit introduced a bug by doing stuff > like tracking whether a changed line was modified again shortly after a > commit. > > This plan of him will simply not result in accurate numbers. They won't be completely accurate, but who knows, maybe they'd turn out to have a higher rate of accuracy than we'd expect. (I assume you could do a closer manual study of a small random sample of the results to estimate the accuracy.) Seems worth a try. > Sure, you will get some numbers, but if anyone would e.g. wrongly accuse > me that 2% of my commits last year introduced bugs I would get > ***really*** angry. It's just an experiment; reasonable people won't take it as the final word. --b. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html