Re: git log exclude pathspec from file - supported? plans?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 30, 2021 at 08:22:35PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > The problem is that we try to linearly match every pathspec against
> > every path we consider, so it's quadratic-ish in the number of files in
> > the repo. I played a long time ago with storing non-wildcard pathspecs
> > in a trie that we could traverse as we talked the individual trees we
> > were matching. It performed well, but IIRC the interface was hacky (I
> > had to bolt it specifically onto the way the tree-walker uses
> > pathspecs, and the other pathspec matchers didn't benefit at all).
> >
> > I can probably dig it up if anybody's interested in looking at it.
> 
> If it's not too much trouble I'd find it interesting, but I likely won't
> do anything with it any time soon.

The patches are from 2014 (yikes). I forward ported them to the current
tip of master, but beware:

  - it fails a test in t4010 matching a gitlink with a trailing "/" in the
    pathspec (which I think just didn't exist back then). I suspect this
    is a simple bug (the trie has to record a bit for "this must be a
    dir", and it probably just doesn't understand gitlinks properly).

  - the original renamed diff_tree_sha1() to diff_tree_sha1_recurse(),
    which we then call internally while recursing (to pass the trie
    cursor). And then top-level calls use a diff_tree_sha1() wrapper
    that kicks off the traversal (to avoid disrupting other callers).

    When forward-porting, those functions got renamed and reorganized to
    handle multi-parent diffs. I was able to just use the internal
    ll_diff_tree_paths() as my "recurse" version, but I'm not 100% sure
    that's right (e.g., should internal calls to ll_diff_tree_oid also
    grow a cursor parameter?)

  - a few bits had to be ported from the 2-parent case to handle
    n-parents. It _looked_ pretty trivial as I did it, but it's entirely
    possible there are gotchas. Tests seem to pass, though (there are no
    specific tests for finding changes in a merge here, but all tree
    diffs should be using the new code).

  - there are a few basic tests included, but this was never run in any
    kind of production setting. Caveat emptor. :)

> One of the PCREv2 experiments I had very early WIP work towards was to
> create a search index for commit messages, contents etc. and stick it in
> something similar to the --changed-paths part of the commit-graph.

Yeah, to some degree --change-paths may mitigate this, since for a
series of simple pathspecs we'd generate the bloom filter once and then
get O(1) matching per commit.

My ulterior motive with all of this was to plug it into our custom
"blame-tree" (i.e., which commit last-touched each path), with the idea
being that we'd start with a big pathspec, and then shrink it as we find
answers. Removing an item from a bloom filter is awkward, but possible.

Anyway, here are the patches.

  [1/3]: pathspec: add optional trie index
  [2/3]: pathspec: turn on tries when appropriate
  [3/3]: tree-diff: use pathspec tries

 Makefile                      |   1 +
 pathspec.c                    | 127 ++++++++++++++++++++++++++++++++++
 pathspec.h                    |  13 ++++
 t/helper/test-pathspec.c      |  96 +++++++++++++++++++++++++
 t/helper/test-tool.c          |   1 +
 t/helper/test-tool.h          |   1 +
 t/perf/p4002-diff-pathspec.sh |  26 +++++++
 t/t6137-pathspec-trie.sh      |  57 +++++++++++++++
 tree-diff.c                   | 102 +++++++++++++++++++++++----
 9 files changed, 411 insertions(+), 13 deletions(-)
 create mode 100644 t/helper/test-pathspec.c
 create mode 100755 t/perf/p4002-diff-pathspec.sh
 create mode 100755 t/t6137-pathspec-trie.sh

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux