On Fri, Aug 03, 2018 at 02:23:17PM -0400, Jeff Hostetler wrote: > > Maybe. It might not work as ino_t. Or it might be expensive to get. Or > > maybe it's simply impossible. I don't know much about Windows. Some > > searching implies that NTFS does have a "file index" concept which is > > supposed to be unique. > > This is hard and/or expensive on Windows. Yes, you can get the > "file index" values for an open file handle with a cost similar to > an fstat(). Unfortunately, the FindFirst/FindNext routines (equivalent > to the opendir/readdir routines), don't give you that data. So we'd > have to scan the directory and then open and stat each file. This is > terribly expensive on Windows -- and the reason we have the fscache > layer (in the GfW version) to intercept the lstat() calls whenever > possible. I think that high cost might be OK for our purposes here. This code would _only_ kick in during a clone, and then only on the error path once we knew we had a collision during the checkout step. > Another thing to keep in mind is that the collision could be because > of case folding (or other such nonsense) on a directory in the path. > I mean, if someone on Linux builds a commit containing: > > a/b/c/D/e/foo.txt > a/b/c/d/e/foo.txt > > we'll get a similar collision as if one of them were spelled "FOO.txt". True, though I think that may be OK. If you had conflicting directories you'd get a _ton_ of duplicates listed, but that makes sense: you actually have a ton of duplicates. > Also, do we need to worry about hard-links or symlinks here? I think we can ignore hardlinks. Git never creates them, and we know the directory was empty when we started. Symlinks should be handled by using lstat(). (Obviously that's for a Unix-ish platform). > I'm sure there are other edge cases here that make reporting > difficult; these are just a few I thought of. I guess what I'm > trying to say is that as a first step just report that you found > a collision -- without trying to identify the set existing objects > that it collided with. I certainly don't disagree with that. :) > > At any rate, until we have an actual plan for Windows, I think it would > > make sense only to split the cases into "has working inodes" and > > "other", and make sure "other" does something sensible in the meantime > > (like mention the conflict, but skip trying to list duplicates). > > Yes, this should be split. Do the "easy" Linux version first. > Keep in mind that there may also be a different solution for the Mac. I assumed that an inode-based solution would work for Mac, since it's mostly BSD under the hood. There may be subtleties I don't know about, though. -Peff