On Wed, 2015-11-04 at 18:38 -0800, Junio C Hamano wrote: > David Turner <dturner@xxxxxxxxxxxxxxxx> writes: > > > In unpack-trees.c, in verify_uptodate_1, we check ie_match_stat. This > > returns OWNER_CHANGED if a file has changed ownership since the index > > was updated. Do we actually care about that particular case? Or really > > anything other than DATA_CHANGED? > > That's a 10-year old code and there aren't that many people left > who can answer the original rationale, I am afraid ;-) > > In general, "Do we actually care?" is not the question we ask in > this area of the code. "Does it help us to catch real changes, or > does it change spuriously to make it too unreliable a signal to be > useful?" is the question that drives the design of this part of the > system. > > DATA_CHANGED is "we know the contents are different without even > looking at the data". If the size is different from the last time > we hashed the data, the contents must have changed. The inverse is > not true (and that is half of the "racy git" issue). > > Other *_CHANGED are finely classified only because originally we > didn't really know which are useful to treat as notable change > event, and "changed" variable had sufficient number of bits to hold > different classification, so that we could pick and choose which > ones we truly care. We knew MTIME was useful in the sense that even > if the size is the same, updated mtime is good enough indication > that the stuff has changed, even to "make" utility. > > INODE and CTIME are not so stable on some filesystems (e.g. inum may > not be stable on a network share across remount) and in some > environments (e.g. some virus scanners touch ctime to mark scanned > files, cf. 1ce4790b), and would trigger false positives too often to > be useful. We always paid attention to them initially, but there > are configurations to tell Git not raise them these days. > > OWNER probably falls into a category that is stable enough to be > useful, as the most likely way for it to change is not by running > "chown" on the file in-place (which does not change the contents), > but by running "mv" to drop another file owned by somebody else to > the original location (which likely does change the contents). At > the same time, "mv" a different file into the path would likely > trigger changes to INODE and MTIME as well, so it cannot be more > than belt-and-suspenders measure to catch modification. In that > sense ignoring OWNER would not hurt too much. > > If it changes spuriously to make it too unreliable a signal to be > useful, it certainly is OK to introduce a knob to ignore it. It > might even make sense to ignore it unconditionally if the false hit > happens too frequently, but offhand my gut reaction is that there > may be something wrong in the environment (i.e. system outside Git > in which Git runs) if owner/group changes spuriously to cause > issues. Thanks. The only case where we saw it was with our watchman code, which lies about ownership (to save space/time). We're going to try ignoring OWNER_CHANGED in our watchman branch, and if that fixes the issue for our users, we'll stop worrying about it on the theory that Duy's watchman stuff is the long-term path forward, and it doesn't have this issue. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html