On Wed, Aug 15, 2018 at 12:38:49PM -0700, Junio C Hamano wrote: This should answer Duys comments as well. > Torsten Bögershausen <tboegi@xxxxxx> writes: > [snip] > > Should the following be protected by core.checkstat ? > > if (check_stat) { > > I do not think such a if statement is strictly necessary. > > Even if check_stat tells us "when checking if a cached stat > information tells us that the path may have modified, use minimum > set of fields from the 'struct stat'", we still capture and update > the values from the same "full" set of fields when we mark a cache > entry up-to-date. So it all depends on why you are limiting with > check_stat. Is it because stdev is unusable? Is it because nsec is > unusable? Is it because ino is unusable? Only in the last case, > paying attention to check_stat will reduce the false positive. > > But then you made me wonder what value check_stat has on Windows. > If it is false, perhaps we do not even need the conditional > compilation, which is a huge plus. Agreed: check_stat is 0 on Windows, and inum is allways 0 in lstat(). I was thinking about systems which don't have inodes and inum, and then generate an inum in memory, sometimes random. After a reboot or a re-mount of the file systems those ino values change. However, for the initial clone we are fine in any case. > > >> + if (dup->ce_stat_data.sd_ino == st->st_ino) { > >> + dup->ce_flags |= CE_MATCHED; > >> + break; > >> + } > >> + } > >> +#endif > > > > Another thing is that we switch of the ASCII case-folding-detection-logic > > off for Windows users, even if we otherwise rely on icase. > > I think we can use fspathcmp() as a fallback. when inodes fail, > > because we may be on a network file system. > > > > (I don't have a test setup at the moment, but what happens with inodes > > when a Windows machine exports a share to Linux or Mac ?) > > > > Is there a chance to get the fspathcmp() back, like this ? > > If fspathcmp() never gives false positives, I do not think we would > mind using it like your update. False negatives are fine, as that > is better than just punting the whole thing when there is no usable > inum. And we do not care all that much if it is more expensive; > this is an error codepath after all. > > And from code structure's point of view, I think it makes sense. It > would be even better if we can lose the conditional compilation. The current implementation of fspathcmp() does not give false positvies, and future versions should not either. All case-insentive file systems have always treated 'a-z' equal to 'A-Z'. In FAT MS/DOS there had only been uppercase letters as file names, and `type file.txt` (the equivilant to ´cat file.txt´ in *nix) simply resultet in `type FILE.TXT` Later, with VFAT and later with HPFS/NTFS a file could be stored on disk as "File.txt". >From now on ´type FILE.TXT´ still worked, (and all other upper-lowercase combinations). This all is probably nothing new. The main point should be that fspathcmp() should never return a false positive, and I think we all agree on that. Now back to the compiler switch: Windows always set inum to 0 and I can't think about a situation where a file in a working tree gets inum = 0, can we use the following: static void mark_colliding_entries(const struct checkout *state, struct cache_entry *ce, struct stat *st) { int i; ce->ce_flags |= CE_MATCHED; for (i = 0; i < state->istate->cache_nr; i++) { struct cache_entry *dup = state->istate->cache[i]; int folded = 0; if (dup == ce) break; if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) continue; /* * Windows sets ino to 0. On other FS ino = 0 will already be * used, so we don't see it for a file in a Git working tree */ if (st->st_ino && (dup->ce_stat_data.sd_ino == st->st_ino)) folded = 1; /* * Fallback for NTFS and other case insenstive FS, * which don't use POSIX inums */ if (!fspathcmp(dup->name, ce->name)) folded = 1; if (folded) { dup->ce_flags |= CE_MATCHED; break; } } } > > Another thing we maybe want to see is if we can update the caller of > this function so that we do not overwrite the earlier checkout with > the data for this path. When two paths collide, we check out one of > the paths without reporting (because we cannot notice), then attempt > to check out the other path and report (because we do notice the > previous one with lstat()). The current code then goes on and overwrites > the file with the contents from the "other" path. > > Even if we had false negative in this loop, if we leave the contents > for the earlier path while reporting the "other" path, then the user > can get curious, inspect what contents the "other" path has on the > filesystem, and can notice that it belongs to the (unreported--due > to false negative) earlier path. > [snip]