Re: Introduction and Wikipedia and Git Blame

"jamesmikedupont@xxxxxxxxxxxxxx" <jamesmikedupont@xxxxxxxxxxxxxx> · Fri, 16 Oct 2009 16:23:20 +0200

Johannes,
Thanks for your input,
comments below.
mfg,
mike

On Fri, Oct 16, 2009 at 4:11 PM, Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:
> Hi,
>
> On Fri, 16 Oct 2009, jamesmikedupont@xxxxxxxxxxxxxx wrote:
>
>> On Fri, Oct 16, 2009 at 1:26 PM, Johannes Schindelin
>> <Johannes.Schindelin@xxxxxx> wrote:
>> >> Here is the discussion on foundation-l :
>> >> http://www.gossamer-threads.com/lists/wiki/foundation/181163
>> >
>> > I found the link to the bazaar repository there, but do you have a Git
>> > repository, too?
>>
>> Not yet. Where should I put it?  Any suggestions.
>
> github.com has a nice interface.
>
> BTW after reading some of the code, I am a bit surprised that you did not
> do it as a .php script outputting fast-import capable text...

I dont really know php, and I dont have a debugger or any tools in it....
Really cannot understand how people can work in such an environment.

I have done all my hacking work as perl scripts.
These can be rewritten in c later on.

> Okay, so basically you want to analyze the text on a word-by-word basis
> rather than line-by-line.
yes.

>
> Or maybe even better: you want to analyze the text character-by-character.
> That would also nicely circumvent to specify just what makes a word a word
> (subject for a lot of heated discussion during the design of the
> --color-words=<regex> patch).

Yes,  Someone suggested in irc to review the color-words , I have the
source code now and will be looking into that.

>
> Basically, if I had to implement that, I would not try to modify
> builtin-blame.c, but write a new program linking to libgit.a, calling the
> revision walker on the file you want to calculate the blame for.  (One of
> the best examples is probably in builtin-shortlog.c.)
>
> Then I would introduce a linked-list structure which will hold the blamed
> regions in this form:
>
>        struct region {
>                int start;
>                struct region *next;
>        };
>
> Initially, this would have a start element with the start offset 0
> pointing to the end element with start offset being set to the size of the
> blob.
>
> Most likely you will have to add members to this struct, such as the
> original offsets (as you will have to adjust the offsets to the different
> file revisions while you go back in time), and the commit it was
> attributed to.
>
> Then I would make modified "texts" from the blob of the file in the
> current revision and its parent revision, by inserting newlines after
> every single byte (probably replacing the original newlines by other
> values, such as \x01).
>
> The reason for this touchup is that the diff machinery in Git only handles
> line-based diffs.
>
> Then you can parse the hunk headers, adjust the offsets accordingly, and
> attribute the +++ regions to the current commit (by construction, the
> offsets are equal to the line number in the hunk header).  Here it is most
> likely necessary to split the regions.
>
> You should also have a counter how many regions are still unattributed so
> you can stop early.

Ok this sounds like a plan. I think that will be a good outline to
start some work.
I will let you know when I have made some progress.
thanks,
mike
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html