Re: File owner/group and git

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2015-11-04 at 18:38 -0800, Junio C Hamano wrote:
> David Turner <dturner@xxxxxxxxxxxxxxxx> writes:
> 
> > In unpack-trees.c, in verify_uptodate_1, we check ie_match_stat.  This
> > returns OWNER_CHANGED if a file has changed ownership since the index
> > was updated.  Do we actually care about that particular case?  Or really
> > anything other than DATA_CHANGED?
> 
> That's a 10-year old code and there aren't that many people left
> who can answer the original rationale, I am afraid ;-)
> 
> In general, "Do we actually care?" is not the question we ask in
> this area of the code.  "Does it help us to catch real changes, or
> does it change spuriously to make it too unreliable a signal to be
> useful?" is the question that drives the design of this part of the
> system.
> 
> DATA_CHANGED is "we know the contents are different without even
> looking at the data".  If the size is different from the last time
> we hashed the data, the contents must have changed.  The inverse is
> not true (and that is half of the "racy git" issue).
> 
> Other *_CHANGED are finely classified only because originally we
> didn't really know which are useful to treat as notable change
> event, and "changed" variable had sufficient number of bits to hold
> different classification, so that we could pick and choose which
> ones we truly care.  We knew MTIME was useful in the sense that even
> if the size is the same, updated mtime is good enough indication
> that the stuff has changed, even to "make" utility.
> 
> INODE and CTIME are not so stable on some filesystems (e.g. inum may
> not be stable on a network share across remount) and in some
> environments (e.g. some virus scanners touch ctime to mark scanned
> files, cf. 1ce4790b), and would trigger false positives too often to
> be useful.  We always paid attention to them initially, but there
> are configurations to tell Git not raise them these days.
> 
> OWNER probably falls into a category that is stable enough to be
> useful, as the most likely way for it to change is not by running
> "chown" on the file in-place (which does not change the contents),
> but by running "mv" to drop another file owned by somebody else to
> the original location (which likely does change the contents).  At
> the same time, "mv" a different file into the path would likely
> trigger changes to INODE and MTIME as well, so it cannot be more
> than belt-and-suspenders measure to catch modification.  In that
> sense ignoring OWNER would not hurt too much.
> 
> If it changes spuriously to make it too unreliable a signal to be
> useful, it certainly is OK to introduce a knob to ignore it.  It
> might even make sense to ignore it unconditionally if the false hit
> happens too frequently, but offhand my gut reaction is that there
> may be something wrong in the environment (i.e. system outside Git
> in which Git runs) if owner/group changes spuriously to cause
> issues.

Thanks.

The only case where we saw it was with our watchman code, which lies
about ownership (to save space/time).  We're going to try ignoring
OWNER_CHANGED in our watchman branch, and if that fixes the issue for
our users, we'll stop worrying about it on the theory that Duy's
watchman stuff is the long-term path forward, and it doesn't have this
issue.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]