Re: [PATCH 8/8] dir: avoid removing the current working directory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 23, 2021 at 5:19 PM Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
>
> On Tue, Nov 23 2021, Elijah Newren wrote:
>
> > On Tue, Nov 23, 2021 at 10:19 AM Glen Choo <chooglen@xxxxxxxxxx> wrote:
> >>
> >> Elijah Newren <newren@xxxxxxxxx> writes:
> >> [...]
> >> I'm not going to say that we'll *definitely* need remove_path()
> >> in its current form, but mixing concerns like this is an invitation to
> >> unexpected behavior. An (imperfect) example that demonstrates this
> >> principle is https://lore.kernel.org/git/24bffdab139435173712101aaf72f7277298c99d.1632497954.git.gitgitgadget@xxxxxxxxx/,
> >> where we made a change to a generic path matching function in order to
> >> speed up unpack_trees(), but accidentally ended up breaking gitignore.
> >
> > There's no mixture of concerns; my patch is correcting this library
> > function to more fully match its documented intent; from dir.h:
> >
> >     /* tries to remove the path with empty directories along it,
> > ignores ENOENT */
> >     int remove_path(const char *path);
> >
> > Since the parent process's current working directory is still likely
> > parked in that directory, there is a good reason to treat it as
> > non-empty.  Thus the cwd should not be one of those directories
> > removed along with the specified path.  No need to die, just stop
> > removing the leading directories once it hits the cwd (much like it'd
> > stop once it hit a directory that had files left in it).
>
> I can buy that we'd pick this new behavior as a worthwhile trade-off,
> but not that anyone intended for this to be the behavior all along.
>
> I don't think "a process is sitting in it" has ever been anyone's idea
> of a "non-empty directory". Rather it's what rmdir() returning EEXIST or
> ENOTEMPTY maps to.

Yeah, Junio commented on my reasoning in that same paragraph of mine.
Bad reasoning on my part, and you were both right to call it out.

But that reasoning wasn't the underlying motivation for Peff to
suggest the behavior behind this series[1] and this patch, nor the
rationale Junio used to say that the overall behavioral change behind
this series "makes sense".[2]

[1] https://lore.kernel.org/git/YS8eEtwQvF7TaLCb@xxxxxxxxxxxxxxxxxxxxxxx/
[2] https://lore.kernel.org/git/xmqqo86elyht.fsf@gitster.g/

> Doesn't this series also change the behavior of e.g.:
>
>     cd contrib/subtree
>     git rm -r ../subtree

Yes, of course!

Before:

    $ cd contrib/subtree
    $ git rm -r -q ../subtree/
    $ ls -ld ../subtree/
    ls: cannot access '../subtree/': No such file or directory
    $ git status --porcelain
    fatal: Unable to read current working directory: No such file or directory
    $ git checkout HEAD ../subtree/
    fatal: Unable to read current working directory: No such file or directory

After:

    $ cd contrib/subtree
    $ git rm -r -q ../subtree/
    $ ls -ld ../subtree/
    drwxrwxr-x. 1 newren newren 0 Nov 23 19:18 ../subtree/
    $ git status --porcelain
    D  contrib/subtree/.gitignore
    D  contrib/subtree/COPYING
    D  contrib/subtree/INSTALL
    D  contrib/subtree/Makefile
    D  contrib/subtree/README
    D  contrib/subtree/git-subtree.sh
    D  contrib/subtree/git-subtree.txt
    D  contrib/subtree/t/Makefile
    D  contrib/subtree/t/t7900-subtree.sh
    D  contrib/subtree/todo
    $ git checkout HEAD ../subtree/
    Updated 10 paths from c557be478e

Very nice fix, don't you think?


> If so then re the "spidey sense" comment I had earlier: There's no rm
> codepaths or tests changed by this series,

That's not correct; I explicitly added a new rm test in the first
patch in my series.  Further, that same test was modified to mark it
as passing by this particular patch you are commenting on.

> so the implementation of
> doing it at this lower level might be casting too wide a net.

I'm getting the vibe that you are assuming I'm changing these two
functions without realizing what places might be calling them;
basically, that I'm just flippantly changing them.  Ignoring the
ramifications of such an assumption, if this vibe is correct, then let
me inform you that I've read over each and every caller (as well as
searched for other callers of unlink() and rmdir() throughout the tree
to see if they needed similar changes).  In my opinion, *each* *and*
*every* *single* *one* of the calls to remove_path() and
remove_dir_recursively() should take the behavioral change suggested
in this patch.

It's also not clear to me that you understand the point of the change
behind the series.  Clearly, I'm not doing well explaining it, but
have you read Peff's or Junio's comments on why they thought
protecting the_original_cwd makes sense?  Again, see the links [1] and
[2] above.  I think it'd help me understand how to respond to you
better if you could clarify to me whether you disagree with them, or
whether you agree with them but think I've gone wrong in the
implementation of their high level explanation somehow in this
particular patch.

> e.g. changing callers that use "remove_dir_recursively()" to use a
> "remove_dir_recursively_not_cwd()" (or whatever) be a gentler way of
> introducing this, and make sure that each step of the way we grok what's
> being changed, that there's test coverage etc.

I'll ask you the same two questions I asked Glen when he suggested
basically the same thing; if you can provide an answer to either one
of my questions that is compelling to me, I'd be totally onboard with
your suggested change:

(1) Can you point to any concrete example caller anywhere in the code
tree (or even provide a future hypothetical caller) that would in fact
be harmed by the change in this patch?

...and no, I'm not asking you to do my work for me; I've done that
same work -- in fact looking at all callsites -- and came up
empty-handed.  Since this and your previous email are essentially
claiming that I've probably missed things, I think it's a fair
question.

(2) What benefit would there be to introducing these new functions?

In particular, if I were to introduce these new functions, it would
look like this:

  * add new remove_path_not_cwd() and remove_dir_recursively_not_cwd()
function in one patch
  * convert all relevant callsites to use these new functions in the
subsequent patch(es)
  * delete the existing remove_path() and remove_dir_recursively()
functions for two reasons: (1) they are now unused, and (2) future
potential callers of these old functions would more than likely
reintroduce the bugs this series is trying to fix if they were to use
them and should be discouraged from doing so
  * rename {remove_path,remove_dir_recursively}_not_cwd() to remove
the "_not_cwd" suffix in order to have more memorable and less ugly
names

Once finished, the end result would be identical to this patch.  Well,
identical other than taking more patches to get to the same end result
and using up more reviewer time.  If there's some benefit to taking
this circuitous route, though, I'm more than willing to do so.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux