Re: [PATCH 3/3] completion: fix bugs with slashes in remote names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Op 2025-03-03 om 11:36 schreef phillip.wood123@xxxxxxxxx:
I'm not really sure what a "large" number looks like for people who track a lot of repositories. On windows forking is pretty slow (our shell-based test suite is incredibly slow on that platform) so I think it makes sense to try and avoid adding new processes.

Thanks, I didn't know about forking being slow on Windows.

Do you know if we need to worry about hitting the max number or length of arguments to a command? https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/limits.h.html under _POSIX_ARG_MAX says that at least 4KiB should be supported, which seems low enough to maybe cause trouble, but it's a minimum. https://www.in-ulm.de/~mascheck/various/argmax/ lists a bunch of values for different platforms, and from a brief scan it looks like all the remotely-modern platforms have values that are probably large enough for this.

I'd like to do some testing to get actual performance numbers before trying to optimize this, because I think the optimization has some drawbacks, see below.

If optimization is needed, another approach is to parallelize the forks:

{
     local fer_pids=
     for ...
     do
         __git for-each-ref ... &
         fer_pids="$fer_pids $!"
     done
     test -z "$fer_pids" || wait $fer_pids
} | sort | uniq -u

That might cause spikes in cpu/memory/disk usage that aren't ideal though.

Yes, if there were a 100 remotes that's a bit of a fork-bomb.

I also realized after sending that email that it could run into issues with the number of available PIDs too. It could be improved by limiting to n child processes at a time and waiting for one to finish before starting the next, but that's more complication. I think your solution below is better.

local sed_cmd=
local remote
# ref names and therefore remote names cannot contain '*?[]^' so we
# only need to escape '.$/'. Using 'sort -r' means that if there is a
# remote called "github" and another called "github/alice" we will try
# and strip "github/alice" first.
for remote in $(__git_remotes | sort -r)
do
     remote="${remote//./\\./}"
     remote="${remote//\$/\\\$/}"
     remote="${remote//\//\\\//}"

Just FYI since it took me hours to figure this out myself: I think this would break tests on macos because of an old version of bash that handles backslashes weirdly. I think removing the double quotes would work around that issue, and be safe because word splitting doesn't happen in assignments.

Thanks, I'm not familiar with bash's extensions to parameter substitution. The completions can also but used under zsh (git- completion.zsh is pretty much abandoned I think) but it looks like bash and zsh agree on this expansion.

The tests don't automatically run under zsh though, right? Maybe I should try to figure out how to do that in a separate patch...

     sed_cmd="${sed_cmd} -e s/^refs\/remotes\/$remote\/// -e t"
done

Mostly just a note to myself if I end up using this idea: I think a space in $remote would break this because bash would split up the arg to sed. There's probably some way to fix that with extra escaping though?

I should have said in the comment above that ref names cannot contain space, tab, newline etc. so we don't have to worry about the shell's word splitting splitting the patterns. They cannot contain backslashes either.

Good to know. Is https://git-scm.com/docs/git-check-ref-format the place I should have read to understand what escaping is needed and what isn't because of the limitations on ref names?

__git for-each-ref --format="$fer_pfx$sfx" \
     ${GIT_COMPLETION_IGNORE_CASE+--ignore-case} \
     "refs/remotes/*/$cur_*" "refs/remotes/*/$cur_*/**" |

This would search for $cur_ in the wrong place because * only matches one path component, right? Changing to ** might help, but then it would match in places more it shouldn't.

Oh, good point I'd missed that. Could we change sed to run with '-n' and generate an expression that looks like

     -e s|^refs/remotes/$remote/\($cur_\)|\1|p -e t

with the appropriate escaping of $remote and $cur_ so that we only print the output of for-each-ref that matches $cur_?

I think that would work. I was hoping to avoid sed, and I was thinking about using shell's ${parameter#word} to strip and test for matching prefixes. But I can't think of a way to do that with GIT_COMPLETION_IGNORE_CASE. Modern bash has ${parameter@L} to lowercase a string, but I don't think the version of bash on macos has that. With sed it would just be a matter of adding the i flag if needed.

I don't think this is an issue, but do we need to worry about putting remote names and $cur_ on the command line where any other user on the same system can read them? $cur_ was already on the command line before this patch, and remote names appear on other git command lines all the time, so I doubt they're meant to be secret.

Is a Helped-by trailer the right way to credit you for your suggestions on this?




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux