Re: [PATCH 3/3] completion: fix bugs with slashes in remote names

David Mandelberg <david@xxxxxxxxxxxxxx> · Mon, 3 Mar 2025 14:19:05 -0500

Op 2025-03-03 om 11:36 schreef phillip.wood123@xxxxxxxxx:
I'm not really sure what a "large" number looks like for people who 
track a lot of repositories. On windows forking is pretty slow (our 
shell-based test suite is incredibly slow on that platform) so I think 
it makes sense to try and avoid adding new processes.

Thanks, I didn't know about forking being slow on Windows.

Do you know if we need to worry about hitting the max number or length 
of arguments to a command? 
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/limits.h.html 
under _POSIX_ARG_MAX says that at least 4KiB should be supported, which 
seems low enough to maybe cause trouble, but it's a minimum. 
https://www.in-ulm.de/~mascheck/various/argmax/ lists a bunch of values 
for different platforms, and from a brief scan it looks like all the 
remotely-modern platforms have values that are probably large enough for 
this.

I'd like to do some testing to get actual performance numbers before 
trying to optimize this, because I think the optimization has some 
drawbacks, see below.

If optimization is needed, another approach is to parallelize the forks:

{
     local fer_pids=
     for ...
     do
         __git for-each-ref ... &
         fer_pids="$fer_pids $!"
     done
     test -z "$fer_pids" || wait $fer_pids
} | sort | uniq -u

That might cause spikes in cpu/memory/disk usage that aren't ideal 
though.

Yes, if there were a 100 remotes that's a bit of a fork-bomb.

I also realized after sending that email that it could run into issues 
with the number of available PIDs too. It could be improved by limiting 
to n child processes at a time and waiting for one to finish before 
starting the next, but that's more complication. I think your solution 
below is better.

local sed_cmd=
local remote
# ref names and therefore remote names cannot contain '*?[]^' so we
# only need to escape '.$/'. Using 'sort -r' means that if there is a
# remote called "github" and another called "github/alice" we will try
# and strip "github/alice" first.
for remote in $(__git_remotes | sort -r)
do
     remote="${remote//./\\./}"
     remote="${remote//\$/\\\$/}"
     remote="${remote//\//\\\//}"

Just FYI since it took me hours to figure this out myself: I think 
this would break tests on macos because of an old version of bash that 
handles backslashes weirdly. I think removing the double quotes would 
work around that issue, and be safe because word splitting doesn't 
happen in assignments.

Thanks, I'm not familiar with bash's extensions to parameter 
substitution. The completions can also but used under zsh (git- 
completion.zsh is pretty much abandoned I think) but it looks like bash 
and zsh agree on this expansion.

The tests don't automatically run under zsh though, right? Maybe I 
should try to figure out how to do that in a separate patch...

     sed_cmd="${sed_cmd} -e s/^refs\/remotes\/$remote\/// -e t"
done

Mostly just a note to myself if I end up using this idea: I think a 
space in $remote would break this because bash would split up the arg 
to sed. There's probably some way to fix that with extra escaping though?

I should have said in the comment above that ref names cannot contain 
space, tab, newline etc. so we don't have to worry about the shell's 
word splitting splitting the patterns. They cannot contain backslashes 
either.

Good to know. Is https://git-scm.com/docs/git-check-ref-format the place 
I should have read to understand what escaping is needed and what isn't 
because of the limitations on ref names?

__git for-each-ref --format="$fer_pfx$sfx" \
     ${GIT_COMPLETION_IGNORE_CASE+--ignore-case} \
     "refs/remotes/*/$cur_*" "refs/remotes/*/$cur_*/**" |

This would search for $cur_ in the wrong place because * only matches 
one path component, right? Changing to ** might help, but then it 
would match in places more it shouldn't.

Oh, good point I'd missed that. Could we change sed to run with '-n' and 
generate an expression that looks like

     -e s|^refs/remotes/$remote/\($cur_\)|\1|p -e t

with the appropriate escaping of $remote and $cur_ so that we only print 
the output of for-each-ref that matches $cur_?

I think that would work. I was hoping to avoid sed, and I was thinking 
about using shell's ${parameter#word} to strip and test for matching 
prefixes. But I can't think of a way to do that with 
GIT_COMPLETION_IGNORE_CASE. Modern bash has ${parameter@L} to lowercase 
a string, but I don't think the version of bash on macos has that. With 
sed it would just be a matter of adding the i flag if needed.

I don't think this is an issue, but do we need to worry about putting 
remote names and $cur_ on the command line where any other user on the 
same system can read them? $cur_ was already on the command line before 
this patch, and remote names appear on other git command lines all the 
time, so I doubt they're meant to be secret.

Is a Helped-by trailer the right way to credit you for your suggestions 
on this?