So I couldn't figure out how to make stat show me exactly the same
format as the utility you have linked, but everything seems to match up
(up to nsec c/m time which I could not get from stat).
BUT!
The Windows index is definitely corrupted. The utility crashes on it.
After modifying the utility to compare `entry->ce_namelen` against
`strlen(entry->name)`, I found out that they differ for a bunch of
entries, which at some point causes an unfortunate jump which lands
outside of the index.
Hence a question: shouldn't git validate this somehow? I get that the
length of the name is stored separately for speed, but maybe we can have
a special "validate the index" subcommand?
I have fixed the wrong name lengths via the same utility, hoping that it
would help. But sadly, it didn't.
Modifying the script further, I made it stat every single cache entry's
actual file and compare everything. Et voila: mode differs! Git for
windows apparently defaults everything to 644, while NTFS-3G tries to
support actual permissions with UserMapping enabled and so some files
have 664, while others have 777, and more for other files on the drive
but not in the repo.
But alas, backing up the index and changing the mode field to what stat
actually reports didn't help either. It still seems to me like git
should be updating this stuff on it's own if it needs to keep track of
it, but whatever, the issue seems to lie somewhere else.
All in all, definitely seems like a git bug to me. Especially
considering the name length corruption. I guess I'll try to check out
the git sources sometime in the future and play around with them, maybe
I'll find something then. For now, I will use the Linux-native checkout
of my repo and be careful to synchronize the two checkouts via remote
and not forget any unpushed commits. The crime was not perfect after all =(
On 8/30/24 18:55, brian m. carlson wrote:
On 2024-08-30 at 19:25:56, Roman Sandu wrote:
The stat output for a random file in the root of the repository is as
follows:
```
File: <CENSORED>
Size: 91876 Blocks: 184 IO Block: 4096 regular file
Device: 259,2 Inode: 4630629 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1000/romasandu) Gid: ( 1000/romasandu)
Access: 2024-08-29 17:41:04.855126300 +0300
Modify: 2024-08-29 17:41:04.855609000 +0300
Change: 2024-08-29 17:41:04.855609000 +0300
Birth: -
```
Maybe lack of a birth stat is what drives git crazy?
That doesn't exist in POSIX, so it isn't used in Git.
I looked at the Ubuntu git package and it doesn't use `USE_NSEC`, so
your lack of nanosecond precision in timestamps probably isn't the
problem here.
You may want to try using a utility like
https://github.com/shogo82148/git-dump-index to dump the index and find
out what might have changed. You can use `stat -c` to write the data
for the actual files in the same format, and then run `diff` on the two
to find out where they disagree. Or, perhaps you can just eyeball it,
in case there's something obvious (like a `uid` difference).
Or, you could try instrumenting `match_stat_data` or
`stat_validity_check` in `statinfo.c` and printing the data that's
changed.
You might also try disabling untracked cache and see if that fixes the
problem. It might be that there _is_ a bug in that the untracked cache
information isn't correctly refreshed when it was originally written on
a different platform. It's known that Windows writes different
information into the index than Unix systems and perhaps that
information doesn't get refreshed properly.
One other thought: Windows stores symlinks with a different size than
most Unix systems. Windows tends to give them a full block size,
whereas Unix gives them a size of their length in bytes. That
definitely breaks using symlinks in a repository across Windows and WSL.
I don't know if that's what's going on here, but of course it could be
related.