Re: GSoC - Some questions on the idea of

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King <peff@xxxxxxxx> writes:

>   1. You really have 100G of data in the current version that doesn't
>      compress well (e.g., you are storing your music collection). You
>      can't afford to store two copies on your laptop (because you have a
>      fancy SSD, and 100G is expensive again).  You need the working tree
>      version, but it's OK to stream the repo version of a blob from the
>      network when you actually need it (mostly "checkout", assuming you
>      have marked the file as "-diff").

This feels like a good candidate for an independent project that allows
you fuse-mount from a remote repository to give you an illusion that you
have a checkout of a specific version.  Such a remote fuse-server would be
an application that is built using Git, but I do not think we are in any
business on the client end in such a setup.

So I'll write it off as a "non-Git" issue for now.

The other parts of your message is much more interesting.

> Right. This is the same concept, except over the network. So people's
> working repositories are on their own workstations instead of a central
> server. You could even do it today by network-mounting a filesystem and
> pointing your alternates file at it. However, I think it's worth making
> git aware that the objects are on the network for a few reasons:
>
>   1. Git can be more careful about how it handles the objects, including
>      when to fetch, when to stream, and when to cache. For example,
>      you'd want to fetch the manifest of objects and cache it in your
>      local repository, because you want fast lookups of "do I have this
>      object".
>
>   2. Providing remote filesystems on an Internet scale is a management
>      pain (and it's a pain for the user, too). My thought was that this
>      would be implemented on top of http (the connection setup cost is
>      negligible, since these objects would generally be large).
>
>   3. Usually alternate repositories are full repositories that meet the
>      connectivity requirements (so you could run "git fsck" in them).
>      But this is explicitly about taking just a few disconnected large
>      blobs out of the repository and putting them elsewhere. So it needs
>      a new set of tools for managing the upstream repository.

Or you can split out the really large write-only blobs out of SCM control.
Every time you introduce a new blob, throw it verbatim in an append-only
directory on a networked filesystem under some unique ID as its filename,
and maintain a symlink into that networked filesystem under SCM control.

I think git-annex already does something like that...
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]