Op 2025-03-03 om 11:36 schreef phillip.wood123@xxxxxxxxx:
I'm not really sure what a "large" number looks like for people who
track a lot of repositories. On windows forking is pretty slow (our
shell-based test suite is incredibly slow on that platform) so I think
it makes sense to try and avoid adding new processes.
Thanks, I didn't know about forking being slow on Windows.
Do you know if we need to worry about hitting the max number or length
of arguments to a command?
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/limits.h.html
under _POSIX_ARG_MAX says that at least 4KiB should be supported, which
seems low enough to maybe cause trouble, but it's a minimum.
https://www.in-ulm.de/~mascheck/various/argmax/ lists a bunch of values
for different platforms, and from a brief scan it looks like all the
remotely-modern platforms have values that are probably large enough for
this.
I'd like to do some testing to get actual performance numbers before
trying to optimize this, because I think the optimization has some
drawbacks, see below.
If optimization is needed, another approach is to parallelize the forks:
{
local fer_pids=
for ...
do
__git for-each-ref ... &
fer_pids="$fer_pids $!"
done
test -z "$fer_pids" || wait $fer_pids
} | sort | uniq -u
That might cause spikes in cpu/memory/disk usage that aren't ideal
though.
Yes, if there were a 100 remotes that's a bit of a fork-bomb.
I also realized after sending that email that it could run into issues
with the number of available PIDs too. It could be improved by limiting
to n child processes at a time and waiting for one to finish before
starting the next, but that's more complication. I think your solution
below is better.
local sed_cmd=
local remote
# ref names and therefore remote names cannot contain '*?[]^' so we
# only need to escape '.$/'. Using 'sort -r' means that if there is a
# remote called "github" and another called "github/alice" we will try
# and strip "github/alice" first.
for remote in $(__git_remotes | sort -r)
do
remote="${remote//./\\./}"
remote="${remote//\$/\\\$/}"
remote="${remote//\//\\\//}"
Just FYI since it took me hours to figure this out myself: I think
this would break tests on macos because of an old version of bash that
handles backslashes weirdly. I think removing the double quotes would
work around that issue, and be safe because word splitting doesn't
happen in assignments.
Thanks, I'm not familiar with bash's extensions to parameter
substitution. The completions can also but used under zsh (git-
completion.zsh is pretty much abandoned I think) but it looks like bash
and zsh agree on this expansion.
The tests don't automatically run under zsh though, right? Maybe I
should try to figure out how to do that in a separate patch...
sed_cmd="${sed_cmd} -e s/^refs\/remotes\/$remote\/// -e t"
done
Mostly just a note to myself if I end up using this idea: I think a
space in $remote would break this because bash would split up the arg
to sed. There's probably some way to fix that with extra escaping though?
I should have said in the comment above that ref names cannot contain
space, tab, newline etc. so we don't have to worry about the shell's
word splitting splitting the patterns. They cannot contain backslashes
either.
Good to know. Is https://git-scm.com/docs/git-check-ref-format the place
I should have read to understand what escaping is needed and what isn't
because of the limitations on ref names?
__git for-each-ref --format="$fer_pfx$sfx" \
${GIT_COMPLETION_IGNORE_CASE+--ignore-case} \
"refs/remotes/*/$cur_*" "refs/remotes/*/$cur_*/**" |
This would search for $cur_ in the wrong place because * only matches
one path component, right? Changing to ** might help, but then it
would match in places more it shouldn't.
Oh, good point I'd missed that. Could we change sed to run with '-n' and
generate an expression that looks like
-e s|^refs/remotes/$remote/\($cur_\)|\1|p -e t
with the appropriate escaping of $remote and $cur_ so that we only print
the output of for-each-ref that matches $cur_?
I think that would work. I was hoping to avoid sed, and I was thinking
about using shell's ${parameter#word} to strip and test for matching
prefixes. But I can't think of a way to do that with
GIT_COMPLETION_IGNORE_CASE. Modern bash has ${parameter@L} to lowercase
a string, but I don't think the version of bash on macos has that. With
sed it would just be a matter of adding the i flag if needed.
I don't think this is an issue, but do we need to worry about putting
remote names and $cur_ on the command line where any other user on the
same system can read them? $cur_ was already on the command line before
this patch, and remote names appear on other git command lines all the
time, so I doubt they're meant to be secret.
Is a Helped-by trailer the right way to credit you for your suggestions
on this?