Signed-off-by: Ramsay Jones <ramsay@xxxxxxxxxxxxxxxxxxx> --- Whilst testing an msvc-built git on cygwin, I noticed that git-count-objects was producing different results to the cygwin version, viz: $ ./git --version git version 1.6.5.4.gf034d.MSVC $ git --version git version 1.6.5 $ ./git-count-objects 60 objects, 283 kilobytes $ git count-objects 60 objects, 210 kilobytes $ git config core.filemode false $ git count-objects 60 objects, 297 kilobytes $ Note also that the cygwin version of git gives two different answers, in the *same* repository, depending on the setting of core.filemode. (since the "win32 stat() functions" in compat/cygwin.c are used when core.filemode is false) Having looked at the msvc code-path, I also noticed something else a bit odd; the value printed by the msvc version should be a *lower bound* for the amount of disk-space used (since it simply totals the actual file sizes). However, as you can see from the above, when core.filemode is true, the cygwin version of git is *underestimating* even this (210Kb vs 283kb). A quick trip to the debugger confirmed that (st.st_blocks * 512) is less than st.st_size for several files. So, it looked like the block-size was not in units of 512 bytes; so if we look at /usr/include/sys/param.h we find: #define DEV_BSIZE 1024 and in /usr/include/sys/stat.h we find: #define S_BLKSIZE 1024 which seems to indicate a 1K block-size instead. Also, different filesystem types may use different block-sizes, which is why an st.st_blksize was added to struct stat; it seems that the Cygwin struct stat contains the st_blksize field, so lets try a quick test program: $ cat junk.c #include <stdio.h> #include <sys/stat.h> int main(int argc, char *argv[]) { int i, bytes = 0, tot_5 = 0, tot_b = 0; for (i=1; i< argc; i++) { struct stat st; if (lstat(argv[i], &st)) printf("can't lstat '%s'\n", argv[i]); else { int s = (int)st.st_size; int b = (int)st.st_blocks; int f = (int)st.st_blksize; int d = (b * 512); int e = (b * f); printf("%8d, ", s); printf("%2d*512=%-6d (%6d), ", b, d, d-s); printf("%2d*%d=%-6d (%6d)\n", b, f, e, e-s); bytes += s; tot_5 += d; tot_b += e; } } printf("total bytes: %6d\n", bytes); printf("b * 512: %6d (%d)\n", tot_5, tot_5 - bytes); printf("b * blksize: %6d (%d)\n", tot_b, tot_b - bytes); exit(0); } $ ls .git/objects/??/* | xargs ./junk.exe 148, 1*512=512 ( 364), 1*1024=1024 ( 876) 10058, 12*512=6144 ( -3914), 12*1024=12288 ( 2230) 3701, 4*512=2048 ( -1653), 4*1024=4096 ( 395) 463, 4*512=2048 ( 1585), 4*1024=4096 ( 3633) 148, 1*512=512 ( 364), 1*1024=1024 ( 876) 10056, 12*512=6144 ( -3912), 12*1024=12288 ( 2232) 198, 1*512=512 ( 314), 1*1024=1024 ( 826) 1782, 4*512=2048 ( 266), 4*1024=4096 ( 2314) 192, 1*512=512 ( 320), 1*1024=1024 ( 832) 3801, 4*512=2048 ( -1753), 4*1024=4096 ( 295) 14851, 16*512=8192 ( -6659), 16*1024=16384 ( 1533) 3761, 4*512=2048 ( -1713), 4*1024=4096 ( 335) 52, 1*512=512 ( 460), 1*1024=1024 ( 972) 956, 4*512=2048 ( 1092), 4*1024=4096 ( 3140) 956, 4*512=2048 ( 1092), 4*1024=4096 ( 3140) 3703, 4*512=2048 ( -1655), 4*1024=4096 ( 393) 10055, 12*512=6144 ( -3911), 12*1024=12288 ( 2233) total bytes: 64881 b * 512: 45568 (-19313) b * blksize: 91136 (26255) $ Note that, in addition to confirming the 1K block-size, it appears that (on NTFS) files less than 1K are allocated a single block whereas larger files use a multiple of 4 blocks. Well, that is only a guess but it at least sounds plausible! ;-) This patch implements a simple solution which has the useful property of returning a single answer, irrespective of the core.filemode setting. ATB, Ramsay Jones Makefile | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/Makefile b/Makefile index 5d5976f..5624563 100644 --- a/Makefile +++ b/Makefile @@ -783,6 +783,10 @@ ifeq ($(uname_O),Cygwin) NO_FAST_WORKING_DIRECTORY = UnfortunatelyYes NO_TRUSTABLE_FILEMODE = UnfortunatelyYes OLD_ICONV = UnfortunatelyYes + # The st_blocks field is not in units of 512 bytes, as the code + # expects, which leads to an under-estimate of the disk space used. + # In order to use an alternate algorithm, we claim to lack st_blocks. + NO_ST_BLOCKS_IN_STRUCT_STAT = YesPlease # There are conflicting reports about this. # On some boxes NO_MMAP is needed, and not so elsewhere. # Try commenting this out if you suspect MMAP is more efficient -- 1.6.5 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html