Junio C Hamano <gitster@xxxxxxxxx> writes: > Thomas Gummerer <t.gummerer@xxxxxxxxx> writes: > >> I have been drafting the Version 5 of the index format over the past >> few days with the help of Thomas Rast, Michael Haggerty, cmn and >> barrbrain on IRC. > > Hrm, so if there is anything glaringly wrong below, should I reduce the > "trustable reviewer karma point" from these people? Or did you forget to > say "but remaining errors are mine" ;-)? Heh. Partly it's my fault, I told Thomas to get this out tonight for a single reason: Michael's CRC-over-stat idea is so radical that I wanted to know whether there's anything wrong with it. So the misunderstanding is really that this is not anywhere near the final result. But yeah: > > 32-bit crc32 checksum over ctime seconds, ctime nanoseconds, > > ino, file size, dev, uid, gid (All stat(2) data except mtime) [7] > > Giving occassional false positive to "did this change?" is acceptable, but > any false negative is absolutely unacceptable. How does this work with > something like "racy git" situation (i.e. coming from "mtime happens to be > the same as before") but due to crc32 collisions? > > If there is no good answer to the above question, I would have to say that > anybody who suggested or passed this through review loses all the > accumulated reviewer karma points (if s/he has accumulated any, that is). If this is a problem, then the fault is with me (and I do hope I have some karma to lose...). Note that the scenario you outlined is not an issue. The entries other than mtime and ctime are really only compared for equality, see e.g. ce_match_stat_basic(). Comparisons for equality never have false negatives with any hash function. Collisions are false positives. Upon closer reading I noticed that the ie_match_stat and ce_match_stat_* family actually to distinguish between basically all the fields that can change. But this knowledge is never actually put to use, except that there's an optimized code path where * mode/type difference implies changed entry * size difference implies changed entry So that does constitute an argument to not put the size in the stat-hash. Mode and type aren't part of it in the proposed format anyway. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html