Re: [GitStats] Bling bling or some statistics on the git.git repository

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jul 12, 2008 at 12:07 AM, Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:
>> On Fri, Jul 11, 2008 at 11:22 PM, Johannes Schindelin
>> Yeah, I wish 'git log -C -C -M --numstat --sacrifice-chicken
>> --pretty=format:%ae --' would take care of that... That is, a git-blame
>> like mechanism that would detect such moves on a per-commit basis and
>> report them would be very useful to me.
>
> Well, the chicken (or better, a goat) should be sacrificed by you...  The
> option I would call "--code-moves".

If you suggest I write up a patch to 'git log' I am afraid that would
require quite a bit more skill && knowledge of 'git log' than I have
(which is about Null :P).

> But the semantics of that need to be sorted out in a shell script first;
> maybe like I outlined (if that was not coherent, please say so).

Python is one big shell script :P, so if you meant that it should be
part of GitStats (instead of part of 'git log', which I commented on
above), python would be just fine :). The concept was clear enough
though, I think I understand what you mean.

> Well, it is not a matter of getting it right, but it is a matter of
> changes.  For example, everytime we move code from one program into the
> library, and create a file for that, code changes.

<snip>

Yes, that's true, with what you described it makes sense :).

>> Very much so, but the former I figure can be easily done with 'git log
>> -C -C -M' I discovered (I need to parse it's output though, and also
>> determine what to do with moves statistics wise. Should changes made
>> due to moves just be ignored?)
>
> That is not very interesting, as we often move so small parts (think "one
> function") that -C -C -M does not trigger.

Right, why aim for the stuff when there's much more interesting fun
out there? If there was a --code-moves I agree with you that it would
be a lot more interesting to have than going with the current approach
and throwing in '-C -C -M'.

>> That sounds interesting, I won't need to actually do that though, I
>> already have a diff parser that gives me the lines added VS lines
>> deleted on a hunk-by-hunk basis. If it is a true move (e.g., code
>> removed in file X and added in file Y) it should be trivial to detect
>> that.
>> Something along the lines of:
>> for hunk in added:
>>   if hunk in deleted:
>>     print("Over here!!")
>
> I think that is not enough, as a code move can mean that part of a
> function was refactored into a function.  The consequence is often a
> reindent, and possibly rewrapping.

Mhhh, such would be beyond the scope of implementing manually indeed,
and should be left to the likes of a diff tool instead in order to
prevent reinventing the wheel :).

> And it can mean that some lines have to be inserted here and there.  I
> still would count that as a code move "with touch-ups".

True, true, so it turns out that the most interesting data is the most
difficult to mine, how typical.

> So I'd like to see something like
>
> <number-of-commits>: <lines-added> <lines-removed> \
>        <lines-moved-from> <lines-moved-to> <filename>

Ah, I like the idea of recording moved-from and moved-to seperately
instead of ignoring it, why throw away such a perfectly useful
statistic. It would be really nice if I could get this data from 'git
log' (e.g., the lines-moved-from and lines-moved-to) instead of having
to calculate it myself.

> BTW I realized something else: your
> http://alturin.googlepages.com/full_activity.txt lists only
> "gitk-git/po/es.po" under git-git/po/.  And it has as many added as
> deleted lines.

Correct, that's because that is what 'git log' tells me. Have a look at:
$ git log --pretty=format:%ae --numstat HEAD --
And grep for "\.po", you'll see that it lists the other po files under
"/po/de.po"

> So I suspect that "po/*" really lists both gitk's as well as git-gui's .po
> files, but merged together.

Feasible, if I use '-C -C -M' then the behavior on a directory rename
should be to take the found statistics under that directory and move
them too. That could be expensive though, what with having to search
all the keys whether they are affected and so.

-- 
Cheers,

Sverre Rabbelier
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux