From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> This WIP is a follow up to my earlier patch series to teach pack-objects to omit large blobs from packfiles. [1] Like the previous version, this version builds upon a suggestion from Peff [2] to use the traverse_commit_list() machinery to allow custom object filtering using a filter callback. This hides the filtering logic in list-objects.c and list-objects-filters.c and minimizes the changes to actual commands, such as pack-objects. This version adds that same filtering capability to rev-list allowing filtering to be demonstrated without building a packfile. Filtered blobs are printed with a leading "~" (along with their sizes). $ ./git rev-list --objects HEAD~1..HEAD 74f806c70507317b8bdbcf3b08459c7c83906bee 818617707aac81ae4620239182b514f65638e37e d21329bffeb9801682d8d6d6acedc2958d17f4e0 builtin 306c16551e548ace12c709a332bfea22adcc395f builtin/fetch.c $ ./git rev-list --objects --filter-omit-all-blobs --filter-print-manifest HEAD~1..HEAD 74f806c70507317b8bdbcf3b08459c7c83906bee 818617707aac81ae4620239182b514f65638e37e d21329bffeb9801682d8d6d6acedc2958d17f4e0 builtin ~306c16551e548ace12c709a332bfea22adcc395f 40732 $ ./git rev-list --objects --filter-omit-all-blobs --filter-print-manifest --quiet HEAD~1..HEAD ~306c16551e548ace12c709a332bfea22adcc395f 40732 This version contains 3 filters: 1. filter-omit-all-blobs to exclude all blobs (trees and commits only). 2. filter-omit-large-blobs=<n>[kmg] to exclude blobs larger than <n> (but always including ".git*" special files). 3. filter-use-sparse=<blob-ish> to exclude blobs not needed by the corresponding sparse-checkout. Sparse-checkout filtering is currently limited to filtering unneeded blobs. A later enhancement should be able to also filter unneeded tree objects. This version updates clone, fetch, fetch-pack, and upload-pack commands to pass the additional object-filter parameters. As a (possibly) temporary measure, some commands have been updated to relax missing blob errors during consistency checks. Maintining info on missing blobs is currently being discussed in [3]. TODO 1. Incorporate with a patch series like [4] to dynamically fetch a missing blob from the server in read_object on demand. 2. Resolve missing blob consistency check issue. 3. Store filter options from clone in config or .git/info and default to them in subsequent fetches. 4. fsck, gc, and assorted commands. 5. testing. [1] https://public-inbox.org/git/20170622203615.34135-1-git@xxxxxxxxxxxxxxxxx/ [2] https://public-inbox.org/git/20170309073117.g3br5btsfwntcdpe@xxxxxxxxxxxxxxxxxxxxx/ [3] https://public-inbox.org/git/cover.1499800530.git.jonathantanmy@xxxxxxxxxx/ [4] https://public-inbox.org/git/20170505152802.6724-1-benpeart@xxxxxxxxxxxxx/ Jeff Hostetler (19): dir: refactor add_excludes() oidset2: create oidset subclass with object length and pathname list-objects: filter objects in traverse_commit_list list-objects-filters: add omit-all-blobs filter list-objects-filters: add omit-large-blobs filter list-objects-filters: add use-sparse-checkout filter object-filter: common declarations for object filtering rev-list: add object filtering support rev-list: add filtering help text t6112: rev-list object filtering test pack-objects: add object filtering support pack-objects: add filtering help text upload-pack: add filter-objects to protocol documentation upload-pack: add object filtering fetch-pack: add object filtering support connected: add filter_allow_omitted option to API clone: add filter arguments index-pack: relax consistency checks for omitted objects fetch: add object filtering to fetch Documentation/git-pack-objects.txt | 14 + Documentation/git-rev-list.txt | 7 +- Documentation/rev-list-options.txt | 26 ++ Documentation/technical/pack-protocol.txt | 16 + Documentation/technical/protocol-capabilities.txt | 7 + Makefile | 3 + builtin/clone.c | 28 ++ builtin/fetch-pack.c | 3 + builtin/fetch.c | 27 +- builtin/index-pack.c | 15 + builtin/pack-objects.c | 33 +- builtin/rev-list.c | 58 +++- connected.c | 3 + connected.h | 6 + dir.c | 53 +++- dir.h | 4 + fetch-pack.c | 28 ++ fetch-pack.h | 2 + list-objects-filters.c | 361 ++++++++++++++++++++++ list-objects-filters.h | 45 +++ list-objects.c | 66 +++- list-objects.h | 30 ++ object-filter.c | 201 ++++++++++++ object-filter.h | 145 +++++++++ oidset2.c | 101 ++++++ oidset2.h | 56 ++++ t/t6112-rev-list-filters-objects.sh | 37 +++ transport.c | 27 ++ transport.h | 8 + upload-pack.c | 39 ++- 30 files changed, 1425 insertions(+), 24 deletions(-) create mode 100644 list-objects-filters.c create mode 100644 list-objects-filters.h create mode 100644 object-filter.c create mode 100644 object-filter.h create mode 100644 oidset2.c create mode 100644 oidset2.h create mode 100644 t/t6112-rev-list-filters-objects.sh -- 2.9.3