[RFC 0/4] Shallow clones with on-demand fetch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello everyone,

This is an RFC for an enhancement to shallow repositories to make them
behave more like full clones.

I was inspired a bit by Microsoft's announcement of their Git VFS.  I
saw that people have talked in the past about making git fetch objects
from remotes as they are needed, and decided to give it a try.

The patch series adds a "--on-demand" option to git clone, which, when
used in conjunction with the existing shallow clone operations, clones
the full history of the repository's commits, but only the files that
would be included in the shallow clone.

When a file that is missing is required, git requests the file on-demand
from the remote, via a new 'upload-file' service.

Public git servers are unlikely to want to enable this, due to the
addition load it may cause, but within an organization's own network, it
will allow full access to the repository history without needing a full
initial clone.

The patch set is in four parts:
  1:
    Adds the "upload-file" command, which starts a new protocol
    conversation with the client allowing it to request file info and
    file contents.  The connection is kept open so that the client
    can make as many requests as it likes.  The client terminates the
    connection by sending a packet containing "end".
  2:
    Adds the ability for file info and content to be requested from
    the remote if the file cannot be found in any pack, or loose in
    the repository.  Currently this only looks at the default remote,
    but the intention is this would be configurable.
  3:
    Adds the "on-demand" capability to "upload-pack".  When a client
    requests this capability, "upload-pack" includes in the pack
    all commits, even those that would normally be dropped by the
    shallow clone.
  4:
    Adds the "--on-demand" option to clone, to request a shallow
    clone.

This is a proof-of-concept, so it is in no way complete.  It contains a
few hacks to make it work, but these can be ironed out with a bit more
work.  What I have so far is sufficient to try out the idea.  I'd like
to get people's opinions on it before I spend any more time working on
it, plus also I'm not very familiar with the git codebase, so some help
would be appreciated.

As an example, the Linux repository currently stands at 2.0GB of packed
data.  A "git clone --shallow-since=2016-01-01 --on-demand" is only
561MB, and yet remains fully functional.  A git blame on the Makefile,
for example, shows all changes to the file, right back to Linus's
original commit in 2005.

Still to do:

 - Fix up the hacks and make everything work correctly.
 - Make fetching of further updates work correctly.
 - Store the retrieved files in an LRU cache, possibly with the option
   of storing them in the main repo data, too.
 - Add a gc/enshallow operation to make the repo shallower by forgetting
   old files, or moving them to the LRU cache.
 - Add configurable remote to fetch from.
 - Documentation.
 - Much more.

Please let me know what you think, and if an experienced git developer
would like to help out with finishing this, that would be even better.

Mark Thomas (4):
  upload-file: Add upload-file command
  on-demand: Fetch missing files from remote
  upload-pack: Send all commits if client requests on-demand
  clone: Request on-demand shallow clones

 .gitignore             |   1 +
 Makefile               |   3 +
 builtin/clone.c        |   7 +-
 builtin/pack-objects.c |  26 ++++++-
 cache-tree.c           |   2 +-
 cache.h                |   3 +-
 daemon.c               |   6 ++
 fetch-pack.c           |   3 +
 fetch-pack.h           |   1 +
 list-objects.c         |  12 ++--
 list-objects.h         |  13 +++-
 object.h               |   1 +
 on_demand.c            | 183 +++++++++++++++++++++++++++++++++++++++++++++++++
 on_demand.h            |  12 ++++
 sha1_file.c            |   8 ++-
 shallow.c              |   2 +-
 transport.c            |   3 +
 transport.h            |   4 ++
 upload-file.c          |  87 +++++++++++++++++++++++
 upload-pack.c          |   8 ++-
 20 files changed, 370 insertions(+), 15 deletions(-)
 create mode 100644 on_demand.c
 create mode 100644 on_demand.h
 create mode 100644 upload-file.c

-- 
2.7.4




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]