Re: [Foundation-l] Wikipedia meets git

Avery Pennarun <apenwarr@xxxxxxxxx> · Wed, 21 Oct 2009 16:31:20 -0400

On Wed, Oct 21, 2009 at 3:49 PM, Bernie Innocenti <bernie@xxxxxxxxxxx> wrote:
> And here's the the catch: the history of individual files is not
> directly represented in a git repository. It is typically scattered
> across thousands of commit objects, with no direct links to help find
> them. If you want to retrieve the log of a file that was changed only 6
> times in the entire history of the Linux kernel, you'd have to dig
> through *all* of the 170K revisions in the "master" branch.
>
> And it takes some time even if git is blazingly fast:
>
>  bernie@giskard:~/src/kernel/linux-2.6$ time git log  --pretty=oneline REPORTING-BUGS  | wc -l
>  6
>
>  real   0m1.668s
>  user   0m1.416s
>  sys    0m0.210s
>
> (my laptop has a low-power CPU. A fast server would be 8-10x faster).
>
>
> Now, the English Wikipedia seems to have slightly more than 3M articles,
> with--how many? tenths of millions of revisions for sure. Going through
> them *every time* one needs to consult the history of a file would be
> 100x slower. Tens of seconds. Not acceptable, uh?

I think this slowness could be overcome using a simple cache of
filename -> commitid list, right?

That is, you run some variant of "git log --name-only" and, for each
file changed by each commit, add an element to the commit list for
that file.  When committing in the future, use a hook that updates the
cache.  When you want to view the history of a particular file, simply
retrieve exactly the list of commits in that file's commitlist, not
other commits.

It sounds like such a cache could be implemented quite easily outside
of git itself.

Would that help?

That said, I'll bet you find other performance glitches when you
import millions of files and tens/hundreds of millions of commits.
But we probably won't know what those problems are until someone
imports them :)

Have fun,

Avery
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html