Re: [PATCH v2 1/8] for-each-ref: add --stdin option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 10, 2023 at 05:20:56PM +0000, Derrick Stolee via GitGitGadget wrote:

> When a user wishes to input a large list of patterns to 'git
> for-each-ref' (likely a long list of exact refs) there are frequently
> system limits on the number of command-line arguments.
> 
> Add a new --stdin option to instead read the patterns from standard
> input. Add tests that check that any unrecognized arguments are
> considered an error when --stdin is provided. Also, an empty pattern
> list is interpreted as the complete ref set.
> 
> When reading from stdin, we populate the filter.name_patterns array
> dynamically as opposed to pointing to the 'argv' array directly. This
> requires a careful cast while freeing the individual strings,
> conditioned on the --stdin option.

This is a nice feature to have, but I suspect like other pattern
features in Git (e.g., pathspecs), the matching is linear, and thus
pre-expanding the set of refs you're interested in becomes accidentally
quadratic.

And that seems to be the case here. If I have N refs and feed the whole
set as patterns via --stdin:

-- >8 --
for i in 4000 8000 16000 32000; do
  rm -rf repo
  git init -q repo
  (
    cd repo
    git commit --allow-empty -qm foo
    perl -e '
      my ($oid, $n) = @ARGV;
      print "create refs/heads/branch$_ $oid\n" for (1..$n);
    ' $(git rev-parse HEAD) $i |
    git update-ref --stdin
    git for-each-ref --format='%(refname)' >refs
    echo -n "$i: "
    command time -f %U \
      git.compile for-each-ref --stdin <refs 2>&1 >/dev/null
  )
done
-- 8< --

then the result quadruples for every doubling of the refs.

  4000: 0.32
  8000: 1.33
  16000: 5.10
  32000: 20.90

That may or may not be a show-stopper for your use case, and if not,
I don't think it's something we need to address immediately. But we may
want some kind of "literal" mode, that takes in a list of refs rather
than a list of patterns, and does a sorted-merge with the list of
available refs (or uses a hash table, I guess, but for-each-ref also
tries to avoid even being linear in the total number of refs, so you'd
still want to find the lowest/highest to bound the iteration).

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux