Re: Looking up objects that point to other objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 26, 2011 at 09:01:22PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Here's a couple of tasks that require brute-force with the Git object
> format that I've wanted to do at some point.
> 
>  * Associate a blob with trees
> 
>    Given a blob sha1 find trees that reference it.
> 
>  * Associate trees with commits / other trees.
> 
>    Given a tree find which commit points to that tree, or a parent
>    tree N levels up the stack that a commit points to.

I've more frequently wanted to find the entrance and exit points of a
particular blob in history, and used something like:

  git log --all --no-abbrev -c --raw --format='commit %H' |
  perl -le '
    my @blobs = map { qr/$_/ } @ARGV;
    while(<STDIN>) {
      if (/^commit (.*)/) {
        $commit = $1;
      }
      else {
        foreach my $re (@blobs) {
          next unless /$re/;
          print $commit;
          last;
        }
      }
    }
  ' $blobs

which is fairly efficient. It's brute-force, but at least it all happens
in O(1) processes. It can find blobs in git.git in about a minute or so
on my machine.

I don't think there's a way to ask for all of the trees in a commit in a
single process. It wouldn't be hard to write in C, of course, but it's
not something the current tools support. However, you can use the above
script to narrow the range of commits that you know contain a blob, and
then individually run ls-tree each one. It's better at least than running
ls-tree on every commit in the repo.

Anything that iterates over commits is going to end up seeing the same
trees again and again. I think you could probably do better by thinking
of it like a directed graph problem. Nodes are sha1s, and edges are any
references of interest:

  1. For a commit, make an edge from the commit to its tree.

  2. For a tree, make an edge from the tree to each of its entries.

And then the problem is reduced to "find all commit nodes that have a
path to $blob". Which you can do by breadth-first search from the
commits (or backwards from the blob).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]