Re: Introduction and Wikipedia and Git Blame

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Fri, 16 Oct 2009, jamesmikedupont@xxxxxxxxxxxxxx wrote:

> On Fri, Oct 16, 2009 at 1:26 PM, Johannes Schindelin
> <Johannes.Schindelin@xxxxxx> wrote:
> >> Here is the discussion on foundation-l :
> >> http://www.gossamer-threads.com/lists/wiki/foundation/181163
> >
> > I found the link to the bazaar repository there, but do you have a Git
> > repository, too?
> 
> Not yet. Where should I put it?  Any suggestions.

github.com has a nice interface.

BTW after reading some of the code, I am a bit surprised that you did not 
do it as a .php script outputting fast-import capable text...

> >> the question is, is there a blame tool that we can use for multiple 
> >> horizontal diffs on the same line that will be needed for wikipedia 
> >> articles?
> >
> > I am not quite sure what you want to do horizontally there... Can you
> > explain what you want to see?
> 
> Yes, I would like to see all the contributors to each word or line.
> 
> Basically one line of blame per contributor, so many lines of output.
> Ideally we would have something that is usable in a html display. Lets
> say, just an blame attribute for each word. so on one line :
> 
> This is a line with two changes first change Second change  end of line
> 
> It would look like this in html :
> This is a line with two changes <span blame=revisionid>first
> change</span><span blame=revisionid>Second change</span> end of line
> 
> The blame edit could look like this :
> REVISION ID 1    48     :  This is a line with two changes first
> change first change \
> REVISTION ID 2  48 C:   Second change end of line

Okay, so basically you want to analyze the text on a word-by-word basis 
rather than line-by-line.

Or maybe even better: you want to analyze the text character-by-character.  
That would also nicely circumvent to specify just what makes a word a word 
(subject for a lot of heated discussion during the design of the 
--color-words=<regex> patch).

Basically, if I had to implement that, I would not try to modify 
builtin-blame.c, but write a new program linking to libgit.a, calling the 
revision walker on the file you want to calculate the blame for.  (One of 
the best examples is probably in builtin-shortlog.c.)

Then I would introduce a linked-list structure which will hold the blamed 
regions in this form:

	struct region {
		int start;
		struct region *next;
	};

Initially, this would have a start element with the start offset 0 
pointing to the end element with start offset being set to the size of the 
blob.

Most likely you will have to add members to this struct, such as the 
original offsets (as you will have to adjust the offsets to the different 
file revisions while you go back in time), and the commit it was 
attributed to.

Then I would make modified "texts" from the blob of the file in the 
current revision and its parent revision, by inserting newlines after 
every single byte (probably replacing the original newlines by other 
values, such as \x01).

The reason for this touchup is that the diff machinery in Git only handles 
line-based diffs.

Then you can parse the hunk headers, adjust the offsets accordingly, and 
attribute the +++ regions to the current commit (by construction, the 
offsets are equal to the line number in the hunk header).  Here it is most 
likely necessary to split the regions.

You should also have a counter how many regions are still unattributed so 
you can stop early.

Ciao,
Dscho

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]