[PATCH 0/7] rev-parse: implement object type filter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I've recently had the usecase to retrieve all blobs introduces between
two versions which have a limit smaller than 200 bytes in order to find
all potential candidates for LFS pointers. This is currently done with
`git rev-list --objects --filter=blob:limit=200 <newrev> ^<oldrev>`, but
this is kind of inefficient: the resulting list is way too long as it
also potentially includes tags, commits and trees.

To be able to more efficiently answer this query, I've implemented
multiple things:

- A new object type filter `--filter=object:type=<type>` for
  git-rev-list(1), which is implemented both for normal graph walks and
  for the packfile bitmap index.

- Given that above usecase requires two filters (the object type
  and blob size filters), bitmap filters were extended to support
  combined filters.

- git-rev-list(1) doesn't filter user-provided objects and always prints
  them. I don't want the listed commits though and only their referenced
  potential LFS blobs. So I've added a new flag `--filter-provided`
  which marks all provided objects as not-user-provided such that they
  get filtered the same as all the other objects.

Altogether, this ends up with the following queries, both of which have
been executed in a well-packed linux.git repository:

    # Previous query which uses object names as a heuristic to filter
    # non-blob objects, which bars us from using bitmap indices because
    # they cannot print paths.
    $ time git rev-list --objects --filter=blob:limit=200 \
        --object-names --all | sed -r '/^.{,41}$/d' | wc -l
    4502300

    real 1m23.872s
    user 1m30.076s
    sys  0m6.002s

    # New query.
    $ time git rev-list --objects --filter-provided \
        --filter=object:type=blob --filter=blob:limit=200 \
        --use-bitmap-index --all | wc -l
    22585

    real 0m19.216s
    user 0m16.768s
    sys  0m2.450s

So with the new optimized query, we can both significantly reduce the
list of candidate LFS pointers and execution time.

Patrick

Patrick Steinhardt (7):
  revision: mark commit parents as NOT_USER_GIVEN
  list-objects: move tag processing into its own function
  list-objects: support filtering by tag and commit
  list-objects: implement object type filter
  pack-bitmap: implement object type filter
  pack-bitmap: implement combined filter
  rev-list: allow filtering of provided items

 Documentation/rev-list-options.txt  |   3 +
 builtin/rev-list.c                  |  14 ++++
 list-objects-filter-options.c       |  14 ++++
 list-objects-filter-options.h       |   8 ++
 list-objects-filter.c               | 116 ++++++++++++++++++++++++++++
 list-objects-filter.h               |   2 +
 list-objects.c                      |  32 +++++++-
 pack-bitmap.c                       |  71 +++++++++++++++--
 revision.c                          |   4 +-
 revision.h                          |   3 -
 t/t6112-rev-list-filters-objects.sh |  76 ++++++++++++++++++
 t/t6113-rev-list-bitmap-filters.sh  |  54 ++++++++++++-
 12 files changed, 380 insertions(+), 17 deletions(-)

-- 
2.30.1

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux