2012/2/5 Junio C Hamano <gitster@xxxxxxxxx>: > Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> writes: > >> read-cache.c learned to produce version 2 or 3 depending on whether >> extended cache entries exist in 06aaaa0 (Extend index to save more flags >> - 2008-10-01), first released in 1.6.1. The purpose is to keep >> compatibility with older git. It's been more than three years since >> then and git has reached 1.7.9. Drop support for older git. > > Cc'ing this, as I suspect this would surely raise eyebrows of some people > who wanted to get rid of the version 3 format. Version 3 was a mistake because of the variable length record sizes. Saving 2 bytes on some records that don't use the extended flags makes the index file *MUCH* harder to parse. So much so that we should take version 3 and kill it, not encourage it as the default! IMHO, when these extended flags were added to make version 3 the following should have happened: - All records use the larger structure format with 4 bytes for the flags, not 2 bytes. - Change the trailing padding after the name to be a *SINGLE* \0 byte, and do not pad out to an 8 byte boundary. Both make it really hard to process the file, and the latter happens only for direct mmap usage, which we don't do anymore. We also have to consider the EGit and JGit user base as part of the ecosystem. We can't just kill a file format because git-core has been capable of reading its alternative since some arbitrary YYYY-MM-DD release date. We need to also consider when did some other major tools catch up and also support this format? FWIW JGit released index version 3 support in version 0.9.1, which shipped Sep 15, 2010. JGit/EGit were more than 2 years behind here. <thinking type="wishful" probability="never-happen" probably-inflating-flame-from="linus"> I have long wanted to scrap the current index format. I unfortunately don't have the time to do it myself. But I suspect there may be a lot of gains by making the index format match the canonical tree format better by keeping the tree structure within a single file stream, nesting entries below their parent directory, and keeping tree SHA-1 data along with the directory entry. For one thing the index would be able to register an empty subdirectory, rather than ignoring them. It would also better line up with the filesystem's readdir() handling, giving us more sane logic to compare what readdir() tells us exists against what the index thinks should be in the same file. And the overall index should be smaller, because we don't have to repeat the same path/to/a/file/for/every/file/in/that/same/directory/tree. Reconstructing the path strings at read time into a flat list should be pretty trivial, and still keep the parallel lstat calls running off a flat list working well for fast status operations. </thinking> -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html