On Mon, Apr 02, 2012 at 03:19:35PM -0700, Junio C Hamano wrote: > > 1. You really have 100G of data in the current version that doesn't > > compress well (e.g., you are storing your music collection). You > > can't afford to store two copies on your laptop (because you have a > > fancy SSD, and 100G is expensive again). You need the working tree > > version, but it's OK to stream the repo version of a blob from the > > network when you actually need it (mostly "checkout", assuming you > > have marked the file as "-diff"). > > This feels like a good candidate for an independent project that allows > you fuse-mount from a remote repository to give you an illusion that you > have a checkout of a specific version. Such a remote fuse-server would be > an application that is built using Git, but I do not think we are in any > business on the client end in such a setup. I think this is backwards. The primary item you want on the laptop is the working directory, because you will be accessing and manipulating the files. That must always work, whether the network is connected or not. You occasionally will want to perform git operations. Most of these should succeed when disconnected, but it's OK for some operations (like checking out an older version of a large blob) to fail. But if you are mounting a remote repository and pretending that you have a local checkout, then just accessing the files either requires a network, or you end up caching most of the remote repository. It would make more sense to me to clone a bare repository of what's upstream, and then fuse-mount the local bare repository to provide a fake working directory. And I believe somebody made such a fuse filesystem in the early days of git. However, I recall that it was read-only. I'm not sure how you would handle writing to the git-mounted directory. > Or you can split out the really large write-only blobs out of SCM control. > Every time you introduce a new blob, throw it verbatim in an append-only > directory on a networked filesystem under some unique ID as its filename, > and maintain a symlink into that networked filesystem under SCM control. > > I think git-annex already does something like that... Yes, and git-media basically does this, too. But it's awful to use, because the user has to be constantly aware of these special links and managing them. You can't just store a symlink into the networked filesystem. For one thing, the path may be different on each client machine, so a simple symlink doesn't work. For another, symlinks into a blob repository mean that the files must be read-only (since they are basically blob-equivalents). So you don't really get your own copy of the file; you can _replace_ it and update the symlink, but you can't actually modify it. So what things like git-media end up doing is to try to insert themselves between git and the user, and transparently convert the file into its unique ID on "git add" and tweak the working directory to contain the actual file on checkout. And it kind of works, but there are a lot of rough edges (I don't recall the details, but they came up in past discussions; clean and smudge filters almost get you there, but not quite). Basically what I'm proposing to do is to just move that logic into git itself, so it can just happen at the blob storage level. I don't think it would even be that much code inside git; you'd want the interface to be pluggable, so all of the heavy lifting would happen inside of a helper (so really, this isn't necessarily even "network alternates" as much as "pluggable alternates"). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html