Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Mon, 23 Oct 2006, Linus Torvalds wrote:
> 
> Try it. The default "extreme" simplification is a _hell_ of a lot faster 
> than doing the full history.
[ timings removed ]

Btw, the reason it is so much faster is that it can be done early, and 
allows us to prune out parts of the history that we don't care about.

For example, when we hit a merge, and the result of that merge is 
identical to one of the parents (in the set of filenames that we are 
interested in), we can simply choose to totally ignore the other parent, 
and we don't need to traverse that history at _all_. Because clearly, all 
the actual _data_ came from just the other one.

So the "extreme" simplification is way way faster, because in the presense 
of a lot of merges, it can select to go down just one of the paths, and 
totally ignore the other ones. In practice, for a fairly "bushy" history 
tree like the kernel, that can cut down the number of commits you need to 
compare by a factor of two or more.

In many ways, it is also actually a _better_ result, in that it's a 
"closer to minimal" way of reaching a particular state. So if you're just 
interested in how something came to be, and want to just cut through the 
crap, the result extreme simplification really _is_ better.

So the branches that were dismissed really _aren't_ important - they might 
contain real work, but from the point of the end result, that real work 
might as well not have happened, since the simpler history we chose _also_ 
explain the end result sufficiently.

So I think the default simplification is really a good default: not only 
because it's fundamentally cheaper, but because it is actually more likely 
to be distill what you actually care about if you wonder what happened to 
a file or a set of files.

But if you care about all the "side efforts" that didn't actually matter 
for the end result too, then you'd want the more expensive, and more 
complete graph. But it _will_ be a lot more expensive to compute.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]