[PATCH 00/13] RFC object filtering for parital clone

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>


This patch series contains WIP code demonstrating object (blob) filtering
in rev-list and pack-objects using a common filtering API in
list-objects and traverse-commit-list that allows both commands
to perform the same type of filter operations.  And serve as the
basis of partial-clone and partial-fetch.

This draft contains filters to:
() omit all blobs
() omit blobs larger than some size
() omit blobs using a sparse-checkout specification

In addition to specifying the filter criteria, the rev-list command
was updated to include options to:
() print a list of the omitted objects (due to the current filtering
   criteria)
() print a list of missing objects (probably from a prior partial
   clone/fetch).

This latter print option can be used with or without a new filter
criteria allowing it to be used with a pre-checkout bulk pre-fetch
command.

For example, if blobs were omitted during the clone or a fetch, the
client can do:

   git rev-list --quiet --objects --filter-print-missing NEWBRANCH

and get a list of just the objects that are required to checkout
NEWBRANCH.

Or if a sparse-checkout is in effect, the client can specify the
same criteria to look for just the missing blobs needed to do the
sparse-checkout:

   git rev-list --quiet --objects --filter-print-missing \
       --filter-use-path=./git/info/sparse-checkout NEWBRANCH

It does not matter why a blob is missing; that is, what filter
criteria was used during the clone or fetch.  All that matters
is the blob is missing and is now needed.

These commands output a list of missing blobs that can be fed
into a bulk fetch object request.  The goal here is to minimize
the need for dynamic object fetch mechanisms currently being
discussed.  (We cannot eliminate the need for dynamic fetching,
but we can use this to precompute/prefetch in bulk.)

Pack-objects was updated to allow the server to build incomplete
packfiles without unwanted blobs.

This is the first step to support partial-clone and -fetch. I've
omitted from this patch series corresponding changes to fetch-pack,
upload-pack, index-pack, verify-pack, fsck, gc, and the git protocol.
I can make these available if there is interest.  I omit them from
this RFC to not distract from the basic filtering ideas.

It also does not address the promisor/promised ideas currently
being discussed [2,3].  These should be considered independently.

The code in this patch series can be seen here [1].

[1] https://github.com/jeffhostetler/git/pull/3
[2] https://public-inbox.org/git/xmqq8thbqlqf.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxx/t/
[3] https://github.com/jonathantanmy/git/commits/partialclone2


Jeff Hostetler (13):
  dir: refactor add_excludes()
  oidset2: create oidset subclass with object length and pathname
  list-objects: filter objects in traverse_commit_list
  list-objects-filter-all: add filter to omit all blobs
  list-objects-filter-large: add large blob filter to list-objects
  list-objects-filter-sparse: add sparse-checkout based filter
  object-filter: common declarations for object filtering
  list-objects: add traverse_commit_list_filtered method
  rev-list: add object filtering support
  rev-list: add filtering help text
  t6112: rev-list object filtering test
  pack-objects: add object filtering support
  pack-objects: add filtering help text

 Documentation/git-pack-objects.txt  |  17 +++
 Documentation/git-rev-list.txt      |   9 +-
 Documentation/rev-list-options.txt  |  32 +++++
 Makefile                            |   5 +
 builtin/pack-objects.c              |  24 +++-
 builtin/rev-list.c                  |  73 +++++++++-
 dir.c                               |  53 ++++++-
 dir.h                               |   4 +
 list-objects-filter-all.c           |  85 ++++++++++++
 list-objects-filter-all.h           |  18 +++
 list-objects-filter-large.c         | 108 +++++++++++++++
 list-objects-filter-large.h         |  18 +++
 list-objects-filter-sparse.c        | 221 +++++++++++++++++++++++++++++
 list-objects-filter-sparse.h        |  30 ++++
 list-objects.c                      | 100 +++++++++++---
 list-objects.h                      |  41 ++++++
 object-filter.c                     | 269 ++++++++++++++++++++++++++++++++++++
 object-filter.h                     | 173 +++++++++++++++++++++++
 oidset2.c                           | 104 ++++++++++++++
 oidset2.h                           |  58 ++++++++
 t/t6112-rev-list-filters-objects.sh | 237 +++++++++++++++++++++++++++++++
 21 files changed, 1657 insertions(+), 22 deletions(-)
 create mode 100644 list-objects-filter-all.c
 create mode 100644 list-objects-filter-all.h
 create mode 100644 list-objects-filter-large.c
 create mode 100644 list-objects-filter-large.h
 create mode 100644 list-objects-filter-sparse.c
 create mode 100644 list-objects-filter-sparse.h
 create mode 100644 object-filter.c
 create mode 100644 object-filter.h
 create mode 100644 oidset2.c
 create mode 100644 oidset2.h
 create mode 100755 t/t6112-rev-list-filters-objects.sh

-- 
2.9.3




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux