From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> This patch series contains WIP code demonstrating object (blob) filtering in rev-list and pack-objects using a common filtering API in list-objects and traverse-commit-list that allows both commands to perform the same type of filter operations. And serve as the basis of partial-clone and partial-fetch. This draft contains filters to: () omit all blobs () omit blobs larger than some size () omit blobs using a sparse-checkout specification In addition to specifying the filter criteria, the rev-list command was updated to include options to: () print a list of the omitted objects (due to the current filtering criteria) () print a list of missing objects (probably from a prior partial clone/fetch). This latter print option can be used with or without a new filter criteria allowing it to be used with a pre-checkout bulk pre-fetch command. For example, if blobs were omitted during the clone or a fetch, the client can do: git rev-list --quiet --objects --filter-print-missing NEWBRANCH and get a list of just the objects that are required to checkout NEWBRANCH. Or if a sparse-checkout is in effect, the client can specify the same criteria to look for just the missing blobs needed to do the sparse-checkout: git rev-list --quiet --objects --filter-print-missing \ --filter-use-path=./git/info/sparse-checkout NEWBRANCH It does not matter why a blob is missing; that is, what filter criteria was used during the clone or fetch. All that matters is the blob is missing and is now needed. These commands output a list of missing blobs that can be fed into a bulk fetch object request. The goal here is to minimize the need for dynamic object fetch mechanisms currently being discussed. (We cannot eliminate the need for dynamic fetching, but we can use this to precompute/prefetch in bulk.) Pack-objects was updated to allow the server to build incomplete packfiles without unwanted blobs. This is the first step to support partial-clone and -fetch. I've omitted from this patch series corresponding changes to fetch-pack, upload-pack, index-pack, verify-pack, fsck, gc, and the git protocol. I can make these available if there is interest. I omit them from this RFC to not distract from the basic filtering ideas. It also does not address the promisor/promised ideas currently being discussed [2,3]. These should be considered independently. The code in this patch series can be seen here [1]. [1] https://github.com/jeffhostetler/git/pull/3 [2] https://public-inbox.org/git/xmqq8thbqlqf.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxx/t/ [3] https://github.com/jonathantanmy/git/commits/partialclone2 Jeff Hostetler (13): dir: refactor add_excludes() oidset2: create oidset subclass with object length and pathname list-objects: filter objects in traverse_commit_list list-objects-filter-all: add filter to omit all blobs list-objects-filter-large: add large blob filter to list-objects list-objects-filter-sparse: add sparse-checkout based filter object-filter: common declarations for object filtering list-objects: add traverse_commit_list_filtered method rev-list: add object filtering support rev-list: add filtering help text t6112: rev-list object filtering test pack-objects: add object filtering support pack-objects: add filtering help text Documentation/git-pack-objects.txt | 17 +++ Documentation/git-rev-list.txt | 9 +- Documentation/rev-list-options.txt | 32 +++++ Makefile | 5 + builtin/pack-objects.c | 24 +++- builtin/rev-list.c | 73 +++++++++- dir.c | 53 ++++++- dir.h | 4 + list-objects-filter-all.c | 85 ++++++++++++ list-objects-filter-all.h | 18 +++ list-objects-filter-large.c | 108 +++++++++++++++ list-objects-filter-large.h | 18 +++ list-objects-filter-sparse.c | 221 +++++++++++++++++++++++++++++ list-objects-filter-sparse.h | 30 ++++ list-objects.c | 100 +++++++++++--- list-objects.h | 41 ++++++ object-filter.c | 269 ++++++++++++++++++++++++++++++++++++ object-filter.h | 173 +++++++++++++++++++++++ oidset2.c | 104 ++++++++++++++ oidset2.h | 58 ++++++++ t/t6112-rev-list-filters-objects.sh | 237 +++++++++++++++++++++++++++++++ 21 files changed, 1657 insertions(+), 22 deletions(-) create mode 100644 list-objects-filter-all.c create mode 100644 list-objects-filter-all.h create mode 100644 list-objects-filter-large.c create mode 100644 list-objects-filter-large.h create mode 100644 list-objects-filter-sparse.c create mode 100644 list-objects-filter-sparse.h create mode 100644 object-filter.c create mode 100644 object-filter.h create mode 100644 oidset2.c create mode 100644 oidset2.h create mode 100755 t/t6112-rev-list-filters-objects.sh -- 2.9.3