Johannes, Thanks for your input, comments below. mfg, mike On Fri, Oct 16, 2009 at 4:11 PM, Johannes Schindelin <Johannes.Schindelin@xxxxxx> wrote: > Hi, > > On Fri, 16 Oct 2009, jamesmikedupont@xxxxxxxxxxxxxx wrote: > >> On Fri, Oct 16, 2009 at 1:26 PM, Johannes Schindelin >> <Johannes.Schindelin@xxxxxx> wrote: >> >> Here is the discussion on foundation-l : >> >> http://www.gossamer-threads.com/lists/wiki/foundation/181163 >> > >> > I found the link to the bazaar repository there, but do you have a Git >> > repository, too? >> >> Not yet. Where should I put it? Any suggestions. > > github.com has a nice interface. > > BTW after reading some of the code, I am a bit surprised that you did not > do it as a .php script outputting fast-import capable text... I dont really know php, and I dont have a debugger or any tools in it.... Really cannot understand how people can work in such an environment. I have done all my hacking work as perl scripts. These can be rewritten in c later on. > Okay, so basically you want to analyze the text on a word-by-word basis > rather than line-by-line. yes. > > Or maybe even better: you want to analyze the text character-by-character. > That would also nicely circumvent to specify just what makes a word a word > (subject for a lot of heated discussion during the design of the > --color-words=<regex> patch). Yes, Someone suggested in irc to review the color-words , I have the source code now and will be looking into that. > > Basically, if I had to implement that, I would not try to modify > builtin-blame.c, but write a new program linking to libgit.a, calling the > revision walker on the file you want to calculate the blame for. (One of > the best examples is probably in builtin-shortlog.c.) > > Then I would introduce a linked-list structure which will hold the blamed > regions in this form: > > struct region { > int start; > struct region *next; > }; > > Initially, this would have a start element with the start offset 0 > pointing to the end element with start offset being set to the size of the > blob. > > Most likely you will have to add members to this struct, such as the > original offsets (as you will have to adjust the offsets to the different > file revisions while you go back in time), and the commit it was > attributed to. > > Then I would make modified "texts" from the blob of the file in the > current revision and its parent revision, by inserting newlines after > every single byte (probably replacing the original newlines by other > values, such as \x01). > > The reason for this touchup is that the diff machinery in Git only handles > line-based diffs. > > Then you can parse the hunk headers, adjust the offsets accordingly, and > attribute the +++ regions to the current commit (by construction, the > offsets are equal to the line number in the hunk header). Here it is most > likely necessary to split the regions. > > You should also have a counter how many regions are still unattributed so > you can stop early. Ok this sounds like a plan. I think that will be a good outline to start some work. I will let you know when I have made some progress. thanks, mike -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html