Hi Junio, On Wed, 3 Jul 2019, Junio C Hamano wrote: > * mt/dir-iterator-updates (2019-06-25) 10 commits > - clone: replace strcmp by fspathcmp > - clone: use dir-iterator to avoid explicit dir traversal > - clone: extract function from copy_or_link_directory > - clone: copy hidden paths at local clone > - dir-iterator: add flags parameter to dir_iterator_begin > - dir-iterator: refactor state machine model > - dir-iterator: use warning_errno when possible > - dir-iterator: add tests for dir-iterator API > - clone: better handle symlinked files at .git/objects/ > - clone: test for our behavior on odd objects/* content > > Adjust the dir-iterator API and apply it to the local clone > optimization codepath. > > Is this ready for 'next'? No. It still breaks many dozens of test cases on Windows (if not more) because it thinks that it can rely on `st_ino` to detect circular symlinks. In https://public-inbox.org/git/nycvar.QRO.7.76.6.1906272046180.44@xxxxxxxxxxxxxxxxx/ I had suggested to do something like this: -- snip -- diff --git a/dir-iterator.c b/dir-iterator.c index 52db87bdc99f..85cd04b7b571 100644 --- a/dir-iterator.c +++ b/dir-iterator.c @@ -8,6 +8,7 @@ struct dir_iterator_level { /* The inode number of this level's directory. */ ino_t ino; + dev_t dev; /* * The length of the directory part of path at this level @@ -63,6 +64,7 @@ static int push_level(struct dir_iterator_int *iter) strbuf_addch(&iter->base.path, '/'); level->prefix_len = iter->base.path.len; level->ino = iter->base.st.st_ino; + level->dev = iter->base.st.st_dev; level->dir = opendir(iter->base.path.buf); if (!level->dir) { @@ -138,11 +140,14 @@ static int find_recursive_symlinks(struct dir_iterator_int *iter) int i; if (!(iter->flags & DIR_ITERATOR_FOLLOW_SYMLINKS) || - !S_ISDIR(iter->base.st.st_mode)) + !S_ISDIR(iter->base.st.st_mode) || + /* On Windows, st_ino is always set to 0 */ + !iter->base.st.st_ino) return 0; for (i = 0; i < iter->levels_nr; ++i) - if (iter->base.st.st_ino == iter->levels[i].ino) + if (iter->base.st.st_ino == iter->levels[i].ino && + iter->base.st.st_dev == iter->levels[i].dev) return 1; return 0; } -- snap -- Duy had also suggested to guard part of this using `USE_STDEV`, but as Matheus figured out that would not make sense, as the `USE_STDEV` flag really is meant to work around issues with network filesystems where `st_dev` can be unreliable. However, in the meantime I thought about this a bit more and I remembered how this is done elsewhere: I saw many recursive symlink resolvers that just have an arbitrary cut-off after following, say, 32 links. In fact, Git itself already has this in abspath.c: /* We allow "recursive" symbolic links. Only within reason, though. */ #ifndef MAXSYMLINKS #define MAXSYMLINKS 32 #endif But then, the patch in question uses `stat()` instead of `lstat()`, so we would not have any way to count the number of symbolic links we followed. Do we _have_ to, though? At some stage the path we come up with is beyond `PATH_MAX` and we can stop right then and there. Besides, the way `find_recursive_symlinks()` is implemented adds quadratic behavior. So I would like to submit the idea of simplifying the code drastically, by skipping the `find_recursive_symlinks()` function altogether. This would solve another issue I have with that function, anyway: The name suggests, to me at least, that we follow symlinks recursively. It does not. I think that could have been addressed by using the adjective "circular" instead of "recursive". But I also think there are enough reasons to do away with this function in the first place. Ciao, Dscho