Hi, I have been developing my git tool (based on the git internal API) that can find out all the commits that have changed a line for better authorship. The reason is for my binary code authorship research, I use machine learning to classify code authorship. To produce training data, I start with a source code repository with well-known author labels for each line and then compiling the project into binary. So, I am able to know the authorship for binary code and then apply some machine learning techniques. To get ground truth of authorship for each line, I start with git-blame. But later I find this is not sufficient because the last commit may only add comments or may only change a small part of the line, so that I shouldn't attribute the line of code to the last author. Of course, there must be some debates on who can be the representative of a line of code. So what I would like to do is find out all the commits that have ever changed a line, then I can try different approaches to summarize over all these commits to produce my final authorship label (or even tuple). I was wondering whether there have been similar debates over accurate authorship in this community before and whether there may be other people interested in this work. Thanks --Xiaozhu -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html