Jeff King <peff@xxxxxxxx> writes: > 1. You really have 100G of data in the current version that doesn't > compress well (e.g., you are storing your music collection). You > can't afford to store two copies on your laptop (because you have a > fancy SSD, and 100G is expensive again). You need the working tree > version, but it's OK to stream the repo version of a blob from the > network when you actually need it (mostly "checkout", assuming you > have marked the file as "-diff"). This feels like a good candidate for an independent project that allows you fuse-mount from a remote repository to give you an illusion that you have a checkout of a specific version. Such a remote fuse-server would be an application that is built using Git, but I do not think we are in any business on the client end in such a setup. So I'll write it off as a "non-Git" issue for now. The other parts of your message is much more interesting. > Right. This is the same concept, except over the network. So people's > working repositories are on their own workstations instead of a central > server. You could even do it today by network-mounting a filesystem and > pointing your alternates file at it. However, I think it's worth making > git aware that the objects are on the network for a few reasons: > > 1. Git can be more careful about how it handles the objects, including > when to fetch, when to stream, and when to cache. For example, > you'd want to fetch the manifest of objects and cache it in your > local repository, because you want fast lookups of "do I have this > object". > > 2. Providing remote filesystems on an Internet scale is a management > pain (and it's a pain for the user, too). My thought was that this > would be implemented on top of http (the connection setup cost is > negligible, since these objects would generally be large). > > 3. Usually alternate repositories are full repositories that meet the > connectivity requirements (so you could run "git fsck" in them). > But this is explicitly about taking just a few disconnected large > blobs out of the repository and putting them elsewhere. So it needs > a new set of tools for managing the upstream repository. Or you can split out the really large write-only blobs out of SCM control. Every time you introduce a new blob, throw it verbatim in an append-only directory on a networked filesystem under some unique ID as its filename, and maintain a symlink into that networked filesystem under SCM control. I think git-annex already does something like that... -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html