[RFC] Tree blame (git blame <directory>)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Currently git-blame supports only ordinary files (blobs).  If you try to 
use git-blame on directory, it errors-out with a bit cryptic error 
message:

  $ git blame Documentation
  fatal: unsupported file type Documentation

Meanwhile some git web interface (e.g. GitHub and Gitorious), most 
probably following web interfaces for file-based VCS like ViewVC for 
CVS and Subversion, provide kind of "tree blame" view as default view 
for directory contents.  It means that for each element (entry) in 
given directory it shows 'last changed' info, namely author, date and 
summary of commit that changed given entry to current version.

I don't know what algorithm they use to generate this info (well,
I could find out in the case of Gitorious... if I read Ruby ;-)), 
but I suspect that they might a bit inefficient algorithm to find such 
info.  Some time ago I tried to add such 'tree_blame' view to gitweb:
you can check the result in the 'gitweb/tree_blame' branch in my 
git/jnareb-git.git repository at repo.or.cz:
  http://repo.or.cz/w/git/jnareb-git.git?a=commitdiff;h=gitweb/tree_blame


It would be nice if "git blame <directory>" would give us required 
information; for tools such like GitHub, Gitorious or gitweb one could 
use '--porcelain' or '--incremental' output.

Unfortunately I don't know this part of code good enough to write it
easily myself. I would think that it wouldn't be too hard to code it;
certainly easier than git-blame for ordinary files.

I think that ordinary git-blame output for trees (directories) could
mimic "ls -l" output format as far as possible, i.e. when currently
  $ git ls-tree --abbrev v1.6.3.3
generates the following output:
  ...
  100644 blob e57630e     walker.c
  100644 blob 8a149e1     walker.h
  100644 blob 7eb3218     wrapper.c
  100644 blob 4c29255     write_or_die.c
  100644 blob 819c797     ws.c
  100644 blob 1b6df45     wt-status.c
  100644 blob 78add09     wt-status.h
  100644 blob b9b0db8     xdiff-interface.c
  100644 blob 7352b9a     xdiff-interface.h
  040000 tree ef5d413     xdiff

then
  $ git blame --abbrev v1.6.3.3 -- .
would generate

  100644 blob e57630e ba19a80 Junio C Hamano      Feb 10 17:42   walker.c
  100644 blob 8a149e1 c13b263 Daniel Barkalow     Apr 26  2008   walker.h
  100644 blob 7eb3218 fc71db3 Alex Riesen         Apr 29 23:21   wrapper.c
  100644 blob 4c29255 559e840 Junio C Hamano      Jul 20  2008   write_or_die.c
  100644 blob 819c797 a437900 Junio C Hamano      Jun 21 02:35   ws.c
  100644 blob 1b6df45 2af202b Linus Torvalds      Jun 18 10:28   wt-status.c
  100644 blob 78add09 6c2ce04 Marius Storm-Olsen  Jun  5  2008   wt-status.h
  100644 blob b9b0db8 eb3a9dd Benjamin Kramer     Mar  7 21:02   xdiff-interface.c
  100644 blob 7352b9a 86295bb Rene Scharfe        Oct 25  2008   xdiff-interface.h
  040000 tree ef5d413 5719db9 Charles Bailey      May 25 01:21   xdiff/

or something like that.  Date doesn't have to be in this strange format
used by 'ls'.  Also instead of name we can use username part of email,
or just email; OTOH git-blame uses above format for author.

The porcelain / incremental output format for "git blame <directory>"
wouldn't need to be changed much from "git blame <file>"; line numbers
do not matter though, as what is important is SHA-1 of entry (blob, tree
or commit).
  $ git blame --porcelain v1.6.3.3 -- .
The blame output for last two lines could look like the following:
  86295bb6bac1482d29650d1f77f19d8e7a7cc2fe 7352b9a9c204c2b1d4ca9df5ce040fe22d6f521c
  author Rene Scharfe
  author-mail <rene.scharfe@xxxxxxxxxxxxxx>
  author-time 1224941475
  author-tz +0200
  committer Junio C Hamano
  committer-mail <gitster@xxxxxxxxx>
  committer-time 1224961771
  committer-tz -0700
  summary add xdi_diff_hunks() for callers that only need hunk lengths
  filename xdiff-interface.h
  100644 blob 7352b9a9c204c2b1d4ca9df5ce040fe22d6f521c    xdiff-interface.h
  5719db91ce5915ee07c50f1afdc94fe34e91529f ef5d413237b3a390007fba56671b00d7c371ae1e
  author Charles Bailey
  author-mail <charles@xxxxxxxxxxxxx>
  author-time 1243210874
  author-tz +0100
  committer Junio C Hamano
  committer-mail <gitster@xxxxxxxxx>
  committer-time 1243234594
  committer-tz -0700
  summary add xdi_diff_hunks() for callers that only need hunk lengths
  filename xdiff
  040000 tree ef5d413237b3a390007fba56671b00d7c371ae1e    xdiff


What do you think about adding such feature?  

It either could use infrastructure for better '--follow' implementation,
or lead to better implementation of '--follow' option (which currently
as it is now works only for simplest cases).  Probably.

-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]