Hi, Michael, Thomas and me just had a lengthy discussion on IRC about racy entries. I'll use "simultaneously" from the perspective of the filesystem's mtimes; depending on your USE_NSEC, that may mean in the same second, or the same nanosecond. Background: Racy Entries ------------------------ There are two cases of racy index entries: (A) echo foo >foo git add foo echo bar >foo If the latter two commands happen simultaneously, lstat() will match the index entry. Git handles this by checking foo.mtime >= index.mtime, and if so, doing a content check. Such entries are called racy. (B) echo foo >foo git add foo # (i) echo bar >foo sleep 2 : >dummy git add dummy # (ii) If the commands before the sleep happen simultaneously, then foo.mtime has not changed since (i), but due to (ii) index.mtime has, defeating the raciness check. To handle this, git checks for racy entries *w.r.t. the old index* immediately before it writes a new index. For all[1] such entries it does a content check. All racy entries found to be modified get ce_size=0, which tells the next git that "we know they are modified". We call them "smudged". The Problem ----------- The use of ce_size=0 is a problem for index v5. The current drafts exclude the size field, instead wrapping it in stat_crc along with most of the other stat fields. There are some obvious solutions: * Put the size field back, costing us 4B/entry. * Use some other marker field for the v5 format, e.g., the stat crc. Neither of these is good, for an entirely different reason: The current scheme checks *all* entries for being racy w.r.t. the old index, before any write. This completely defeats the point of index v5: *avoid* loading the entire index for small changes. Proposed Solution ----------------- (Michael, we have adapted it somewhat this since you left IRC.) When writing an entry: check whether ce_mtime >= index.mtime. If so, write out ce_mtime=0. The index.mtime here is a lower bound on the mtime of the new index, obtained e.g. by touching the index and then stat()ing it immediately before writing out the changed entries. Note that this is a fundamentally different approach from the one taken in v[2-4] indexes. In the old approach, it is the *next* writer's responsibility to ensure that all racy entries are either truly clean, or smudged (since they will presumably lose their raciness). In the new approach, racy entries are immediately smudged and remain so until an update. Footnotes: [1] Ignoring the case where st_size==0 at the beginning, which needs some arguing around because st_size is also the smudge marker. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html