(Please CC me, I'm not on the list) Hi folks, I've spent some time debugging an issue and I'd like to share the results. The conclusion of my debugging is that git does not currently handle changing inode numbers on files well. I have a custom Fuse filesystem, and fuse dynamically allocates inode numbers to paths, but keeps a limited cache of inode -> name mappings, causing the inodes to change over time. Now of course, you'll probably say, "it's the filesystem's fault, git can't be expected to cope with that". You'll be right of course, but since I already spent the time digging into this and figuring out what goes on inside git in this case, I thought I might as well share the analysis, just in case someone sees an easy fix in here, or in case someone else stumbles upon this problem as well. So, the actual problem I was seeing is that running "git status" showed all symlinks as "modified", even though they really were identical between the working copy, index and HEAD. Interestingly enough this only happened when running "git status" without further arguments, when running on a subdirectory, it would show no changes as expected. I compared the output of stat to a hexdump of the index file and found that everything matched, except for the inode numbers. I originally thought I was misinterpreting what I saw, but gdb confirmed that it were indeed the inode numbers that git observed as different. Now, I could have stopped here and started trying to fix my filesystem instead. But it was still weird that this problem only existed for symlinks and that normal files acted as expected. So I dug in a bit deeper, hoping to find some way to make this work for symlinks as well. So, here's what happens (IIUC): - cmd_status calls refresh_index, which calls refresh_cache_ent for every entry in the index. - refresh_cache_ent notices that the inode number has changed (for both symlinks and regular files) and compares the file / symlink contents. - refresh_cache_ent sees the content hasn't changed, so it calls fill_stat_cache_info to update the stat info. - fill_stat_cache_info sets the EC_UPTODATE flag on the entry, but only if it is a regular file. - cmd_status calls wt_status_collect which calls wt_status_collect_changes_worktree which calls run_diff_files. - run_diff_files skips regular files, because of the EC_UPTODATE flag. For symlinks, however, it checks the stat info and notices that the inode number has changed (again). It does not do a content check at this point, but instead just outputs the file as "modified". It turned out that the reason running "git status" on a subdirectory did appear to work, was that the number of files in the subdir wasn't big enough to overflow the inode number cache fuse keeps, so that numbers didn't change in this case (the problem _did_ occur when trying a bigger subdirectory). So, it seems that git just doesn't cope well with changing inode numbers because it checks the content in a first pass in refresh_index, but only checks the stat info in the second pass in run_diff_files. The reason it does work for regular files is EC_UPTODATE optimization introduced in eadb5831: Avoid running lstat(2) on the same cache entry. So, let's see if I can fix my filesystem now ;-) Gr. Matthijs
Attachment:
signature.asc
Description: Digital signature