Re: Figured out how to get Mozilla into git

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano wrote:
Rogan Dawes <lists@xxxxxxxxxxxx> writes:

Here's an idea. How about separating trees and commits from the actual
blobs (e.g. in separate packs)?

If I remember my numbers correctly, trees for any project with a
size that matters contribute nonnegligible amount of the total
pack weight.  Perhaps 10-25%.

Out of curiosity, do you think that it may be possible for tree objects to compress more/better if they are packed together? Or does the existing pack compression logic already do the diff against similar tree objects?

In this way, the user has a history that will show all of the commit
messages, and would be able to see _which_ files have changed over
time e.g. gitk would still work - except for the actual file level
diff, "git log" should also still work, etc

I suspect it would make a very unpleasant system to use.
Sometimes "git diff -p" would show diffs, and other times it
mysteriously complain saying that it lacks necessary blobs to do
its job.  You cannot even run fsck and tell from its output
which missing objects are OK (because you chose to create such a
sparse repository) and which are real corruption.

The fsck problem could be worked around by maintaining a list of objects that are explicitly not expected to be present. As the list gets shorter (perhaps as diffs are performed, other parts of the blob history are retrieved, etc), the list will get shorter until we have a complete clone of the original tree.

Of course diffs against a version further back in the history would fail. But if you start with a checkout of a complete tree, any changes made since that point would at least have one version to compare against.

In effect, what we would have is a caching repository (or as Jakub said, a lazy clone). An initial checkout would effectively be pre-seeding the cache. One does not necessarily even need to get the complete set of commit and tree objects, either. The bare minimum would probably be to get the HEAD commit, and the tree objects that correspond to that commit.

At that point, one could populate the "uncached objects" list with the parent commits. One would not be in a position to get any history at all, of course.

As the user performs various operations, e.g. git log, git could either go and fetch the necessary objects (updating the uncached list as it goes), or fail with a message such as "Cannot perform the requested operation - required objects are not available". (We may require another utility that would list the objects required for an operation, and compare it against the list of "uncached objects", printing out a list of which are not yet available locally. I realise that this may be expensive. Maybe a repo configuration option "cached" to enable or disable this.)

As Jakub suggested, it would be necessary to configure the location of the source for any missing objects, but that is probably in the repo config anyway.

A shallow clone with explicit cauterization in grafts file at
least would not have that problem. Although the user will still
not see the exact same result as what would happen in a full
repository, at least we can say "your git log ends at that
commit because your copy of the history does not go back beyond
that" and the user would understand.

Or, we could say, perform the operation while you are online, and can access the necessary objects. If the user has explicitly chosen to make a lazy clone, then they should expect that at some point, whatever they do may require them to be online to access items that they have not yet cloned.

Rogan
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]