On Wed, 21 Oct 2009, Bernie Innocenti wrote: > And here's the the catch: the history of individual files is not > directly represented in a git repository. It is typically scattered > across thousands of commit objects, with no direct links to help find > them. If you want to retrieve the log of a file that was changed only 6 > times in the entire history of the Linux kernel, you'd have to dig > through *all* of the 170K revisions in the "master" branch. > > And it takes some time even if git is blazingly fast: > > bernie@giskard:~/src/kernel/linux-2.6$ time git log --pretty=oneline REPORTING-BUGS | wc -l > 6 > > real 0m1.668s > user 0m1.416s > sys 0m0.210s > > (my laptop has a low-power CPU. A fast server would be 8-10x faster). > > > Now, the English Wikipedia seems to have slightly more than 3M articles, > with--how many? tenths of millions of revisions for sure. Going through > them *every time* one needs to consult the history of a file would be > 100x slower. Tens of seconds. Not acceptable, uh? > > It seems to me that the typical usage pattern of an encyclopedia is to > change each article individually. Perhaps I'm underestimating the role > of bots here. Anyway, there's no consistency *requirement* for mass > changes to be applied atomically throughout all the encyclopedia, right? You certainly don't need to put all files in the same tree then. Having the whole thing split according to some sections that are unlikely to overlap would be the way to go. Therefore you could arrange subsections to have their own branches with no other files in them, or even rely on Git submodules. The partitioning doesn't necessarily have to be one of the two extremes such as one branch per file à la CVS or all files in the same branch/tree as Git does by default. Nicolas