Re: [PATCH v3 02/11] setup: introduce startup_info->original_cwd

Elijah Newren <newren@xxxxxxxxx> · Mon, 29 Nov 2021 13:58:08 -0800

On Mon, Nov 29, 2021 at 12:41 PM Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
>
>
> On Sat, Nov 27 2021, Elijah Newren wrote:
>
> > On Sat, Nov 27, 2021 at 6:00 AM Ævar Arnfjörð Bjarmason
> > <avarab@xxxxxxxxx> wrote:
> >>
> >> On Fri, Nov 26 2021, Elijah Newren via GitGitGadget wrote:
> >>
> >> [Just some more "I haven't really looked at this in all that much
> >> detail" commentary, so maybe it's stupid, sorry]
> >>
> >> > From: Elijah Newren <newren@xxxxxxxxx>
> >> >
> >> > Removing the current working directory causes all subsequent git
> >> > commands run from that directory to get confused and fail with a message
> >> > about being unable to read the current working directory:
> >> >
> >> >     $ git status
> >> >     fatal: Unable to read current working directory: No such file or directory
> >> >
> >> > Non-git commands likely have similar warnings or even errors, e.g.
> >> >
> >> >     $ bash -c 'echo hello'
> >> >     shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
> >> >     hello
> >>
> >> Is that really realistic?
> >
> > I have 321 shell scripts (or symlinks thereto) in /usr/bin/ on my
> > Fedora laptop, and 951 shell scripts in /usr/bin/ on my Ubuntu
> > workstation at $DAYJOB.  That's not even considering stuff in other
> > directories.  Note that I didn't place any of those scripts in
> > /usr/bin/; they came from the distribution and/or corporate packages
> > from others.  Any invocation of bash will see the above "error"
> > message.  Granted, bash calls it an 'error' but continues anyway, but
> > I thought it showed that there were clearly programs besides git where
> > users could run into problems.
> >
> >> Any "normal" command would use "pwd" or look
> >> at $PWD, both of which "work", this error is only because we're starting
> >> a new shell.
> >
> > <grin>
> >
> > Yeah, good point.  Who would use an unusual program like git?  Or a
> > shell script?  Or a java program?  Or emacs?  Or other programs like
> > them?  Besides, git, bash, java, and emacs are all relatively young
> > commands with small development teams.  Maybe we should just rely on
> > users only using commands that get the pwd/$PWD detail right; perhaps
> > limiting to commands that are more mature and have bigger development
> > teams than those four.
> >
> > Silly users.
> >
> > ;-)
> >
> > Sorry, couldn't resist a little friendly teasing.
> >
> >
> > However, for "pwd"...by "work", do you mean "doesn't necessarily
> > work"?  On my machine:
> >
> > $ mkdir gone
> > $ cd gone
> > $ rmdir ../gone
> > $ pwd -P
> > pwd: error retrieving current directory: getcwd: cannot access parent
> > directories: No such file or directory
> > $ pwd
> > pwd: error retrieving current directory: getcwd: cannot access parent
> > directories: No such file or directory
> > $ echo $PWD
> > /home/newren/floss/git/gone
> >
> > If I do not run `pwd -P` first then a plain `pwd` works.  But your
> > advice to use `pwd` seems misguided for programs, since they'll hit
> > this problem if users have run a previous `pwd -P`.  (The $PWD trick
> > would have worked, as shown above)
> >
> >> I wonder if it was just because you ran into our bin-wrappers edge case,
> >> but that should be really obscure for any real users.
>
> For some reason I was under the misimpression that the "#!/bin/bash"
> part of the bin-wrappers and other scripts somehow immunized them from
> the $PWD/"pwd" reset, and it was only the programs they invoked (like
> git in the bin-wrappers) that didn't get the values passed along.
>
> But that's clearly incorrect as you demonstrate above, so the only thing
> that'll work OK (seemingly) is running "pwd" (but not "pwd -P"), or
> looking at $PWD in your terminal shell itself.

And further, even running a plain "pwd" seems to not work either if
something else has already run "pwd -P" already, as I pointed out
above.  I have no clue how one 'pwd -P' call can prevent future plain
'pwd' calls from working, but that's the behavior I observed.

> Invoking non-POSIX shell programs "works" in that they can use the same
> trick, after the dance of "mkdir x && cd x && rm -rf ..x" e.g. Perl
> says:
>
>     $ perl -MCwd=getcwd -wE 'say $ENV{PWD}; say getcwd'
>     /home/avar/g/git/x
>     Use of uninitialized value in say at -e line 1
>
> This "pwd -P" behavior isn't just something weird in your shell & mine,
> it semse to be mandated by POSIX:
> https://pubs.opengroup.org/onlinepubs/007904875/utilities/pwd.html
>
> All of which is to say I'm much more sympathetic to this approach
> now. I.e. it seemed like purely a way to work around fixable breakages
> in our own and other programs. Well, I guess "don't use shellscript if
> you care about this edge case" is a "fix", but not a trivial one.
>
> I do still think a better starting point would be fixing the setup.c
> dying in our own code, and see where that leaves us, but up to you
> etc. I'm not going to work on it any time soon.
>
> > Yes, I agree our bin-wrappers is a really obscure edge case not worth
> > considering.  I don't see how you jump from there to painting all
> > shell scripts combined with the same brush, though.
>
> *nod*
>
> >> > This confuses end users, particularly since the command they get the
> >> > error from is not the one that caused the problem; the problem came from
> >> > the side-effect of some previous command.
> >> >
> >> > We would like to avoid removing the current working directory of our
> >> > parent process; towards this end, introduce a new variable,
> >> > startup_info->original_cwd, that tracks the current working directory
> >> > that we inherited from our parent process.  For convenience of later
> >> > comparisons, we prefer that this new variable store a path relative to
> >> > the toplevel working directory (thus much like 'prefix'), except without
> >> > the trailing slash.
> >>
> >> I'm still not clear at all on why we need a "original_cwd" at all then
> >> as opposed to just using "prefix" (or adding "the_prefix" if passing it
> >> down is painful). I.e. we discover our relative path, we resolve the
> >> relative path to the root, can't we use that as the "don't remove our
> >> CWD" guard?
> >>
> >> Does our prefix change at some point, then maybe "orig_prefix" would
> >> make more sense?
> >
> > No; see the code comment in the same patch:
> >
> > +       /*
> > +        * startup_info->original_cwd points to the current working
> > +        * directory we inherited from our parent process, which is a
> > +        * directory we want to avoid removing.
> > +        *
> > +        * For convience, we would like to have the path relative to the
> > +        * worktree instead of an absolute path.
> > +        *
> > +        * Yes, startup_info->original_cwd is usually the same as 'prefix',
> > +        * but differs in two ways:
> > +        *   - prefix has a trailing '/'
> > +        *   - if the user passes '-C' to git, that modifies the prefix but
> > +        *     not startup_info->original_cwd.
> > +        */
> >
> > It's never equal to prefix, even though it's usually semantically
> > referring to the same directory.  However, even if it weren't for the
> > trailing slash issue, the -C case means it is not appropriate to think
> > of it as "orig_prefix" either.
>
> Ah, with -C of e.g. t/helper we'll first chdir(t/helper), and then run
> the usual setup.c dance to find that we need to chdir() again to the
> (equivalent of) "../../". But our prefix stays at "t/helper".
>
> I'm a bit confused about the trailing slash case, isn't the prefix
> always going to point to a directory? Why would t/helper v.s. t/helper/
> matter?

Because when you compare two paths, you either need to canonicalize
them in some fashion or do a bunch of tricky work to see if they are
"equal".

> I think it won't matter for rmdir(2) et al, but maybe I'm wrong.

Correct, rmdir won't care, but strcmp() certainly does.  Since we call
strcmp() before deciding whether to remove, and we do that based on
paths built up from prefix + tree traversal + entry name (which won't
have the trailing '/') we want original_cmp to be canonicalized the
same way (i.e. without the trailing '/').

(In contrast, prefix is used as a starting point to build up other
paths, so having a trailing slash was more convenient for it.)

> What got me confused about the "prefix" v.s. "original_cwd" is that I
> was assuming they'd be the same. The commentary on setup_git_directory()
> says as much, i.e.:
>
>     Returns the "prefix", a path to the current working directory
>     relative to the work tree root, or NULL,
>
> But of course we know that's a white lie, it's not the $PWD/getcwd().

I don't think it is a white lie.  This code runs after the chdir()
from the parsing of the global -C option, and before the chdir() to
the worktree root (which occurs later in setup.c).  So the statement
is true, you just have to understand what "current" means in "the
current working directory".  In this context, "current" means *as of
the time the setup.c code where this comment is written starts
running*, which is a pretty logical thing for current to mean.  If you
try reading this code unaware of the other chdir() calls we make,
though, it certainly has the potential to trip you up.

> So you're only trying to save the user in cases of e.g. (in t/helper):
>
>     git rm ../helper
>
> If they actually run it while in t/helper, but would like to explicitly
> omit the case of (at the top-level):
>
>     git -C t/helper rm ../helper
>
> That's fair enough I guess. I'd just assumed those cases would be
> treated the same way.

I don't see why you'd assume that.  The whole point was preserving the
cwd of our parent process so that any subsequent commands that happen
to run in that parent process don't get confused by having their cwd
no longer exist.  In the former command you gave, the cwd of the
parent process was $WORKTREE/t/helper; in the latter, it was
$WORKTREE.  So, they should preserve different things.

> Even in that case, I can't think of a case where this "original_cwd"
> wouldn't be made redundant by some boolean flag to accompany "prefix" to
> indicate that we've chdir'd out of the prefix (or not).

I can't think of a case where a boolean would provide any useful
information.  Don't we always chdir out of the prefix whenever prefix
!= NULL?

Maybe I'm missing something, so let me give you a simple case where
you can explain how this boolean could help.  Let's say you started at
the worktree root and then ran these two commands:

   $ cd subdir/
   $ git -C subsubdir rm lastfile

After finishing the setup phase of the git command, the state should
be the following:
   prefix = subdir/subsubdir/
   original_cwd = subdir
   getcwd = subdir/../  (because setup.c chdirs to the worktree root;
the actual answer would be a more canonicalized path, though)

In this above case, how can we infer the value of original_cwd from
other data if we don't record it?