Re: mt/dir-iterator-updates, was Re: What's cooking in git.git (Jul 2019, #01; Wed, 3)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Dscho

On Thu, Jul 4, 2019 at 7:02 AM Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:
>
> Hi Junio,
>
> On Wed, 3 Jul 2019, Junio C Hamano wrote:
>
> > * mt/dir-iterator-updates (2019-06-25) 10 commits
[...]
> >  Is this ready for 'next'?
>
> No. It still breaks many dozens of test cases on Windows (if not more)
> because it thinks that it can rely on `st_ino` to detect circular
> symlinks.

I wanted to take a look at the failures to see if I could help, but
I'm not very familiar with azure (and travis-ci doesn't run windows'
tests). So the only build I could find, containing the code from this
series, is this[1]. But it only shows 4 failures and they don't seem
to relate with dir-iterator... Could you point me to the right place,
please?

> In
> https://public-inbox.org/git/nycvar.QRO.7.76.6.1906272046180.44@xxxxxxxxxxxxxxxxx/
> I had suggested to do something like this:
>
[...]
>
> However, in the meantime I thought about this a bit more and I remembered
> how this is done elsewhere: I saw many recursive symlink resolvers that
> just have an arbitrary cut-off after following, say, 32 links.
>
> In fact, Git itself already has this in abspath.c:
>
>         /* We allow "recursive" symbolic links. Only within reason, though. */
>         #ifndef MAXSYMLINKS
>         #define MAXSYMLINKS 32
>         #endif
>
> But then, the patch in question uses `stat()` instead of `lstat()`, so we
> would not have any way to count the number of symbolic links we followed.

Hm, I think `stat()` itself uses this strategy of an arbitrary cut-off
when resolving a path. So we may also "ignore" circular symlinks and
let the iteration continue until the point where `stat()` will return
an ELOOP. (But it may be expensive...)

> Do we _have_ to, though? At some stage the path we come up with is beyond
> `PATH_MAX` and we can stop right then and there.
>
> Besides, the way `find_recursive_symlinks()` is implemented adds quadratic
> behavior.

Yes, indeed. But it only happens when we have a path like this:
`symlink_to_dir1/symlink_to_dir2/symlink_to_dir3/symlink_to_dir4/...`,
right? I think this case won't be very usual on actual filesystems,
thought.

> So I would like to submit the idea of simplifying the code drastically,
> by skipping the `find_recursive_symlinks()` function altogether.
>
> This would solve another issue I have with that function, anyway: The name
> suggests, to me at least, that we follow symlinks recursively. It does
> not. I think that could have been addressed by using the adjective
> "circular" instead of "recursive".

Indeed, "circular" sounds much better then "recursive".

> But I also think there are enough
> reasons to do away with this function in the first place.

We can delegate the circular symlinks problem to `stat()'s ELOOP` or
`path_len > PATH_MAX`. The only downside is the overhead of iterating
through directories which will be latter discarded for being in
circular symlinks' chains. I mean, the overhead at dir-iterator
shouldn't be much, but the one on the code using this API to do
something for each of these directories (and its contents), may be.
Also we would need to "undo" the work done for each of these
directories if we want to ignore circular symlinks and continue the
iteration, whereas if we try to detect it a priori, we can skip it
from the beginning.

> Ciao,
> Dscho

[1]: https://dev.azure.com/git-for-windows/git/_build/results?buildId=38852



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux