Re: Does content provenance matter?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8 May 2012 09:13, Kelly Dean <kellydeanch@xxxxxxxxx> wrote:
>
> --- On Mon, 5/7/12, PJ Weisberg <pj@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > But there could be any number of unrelated commits newer than "Bar"
> > but older than "Revert Bar" on other branches.  Even if you could
> > trust the timestamps to be accurate (you can't), you still can't
> > determine a commit's parent unambiguously.
> Therefore, provenance does matter, and it must be explicitly recorded
> because it can't necessarily be correctly and fully deduced from content
> alone. And git does record inter-commit provenance.
> However, git doesn't record intra-commit provenance, as I mentioned in my
> original message. My question is: why this discrepancy? Either provenance
> matters, or it doesn't; why record it in one case but not the other?

I don't think it is firmly decided that provenance is not important in
the intra-commit scope, rather that as you stated such information is
not available to us.

My understanding is that git makes a best guess effort to track the
flow of content through the repository. If the content is moved, by
deleting in one place and adding in another it is easy to see that in
git, however if content is merely added, and that same content occurs
in multiple places in the repository, there is no sane way of knowing
where that content came from.
Even if the content that was added only occurred in one other place,
you would need to check every single file for every single hunk added
every single commit in order to be able to determine just where this
content came from. Why stop there though? It's possible we are copying
the content from some other branch we don't have checked out at the
moment, so every time we commit, let's search the entire repositories
history for an occurrence of each hunk we are adding. This way is
madness.

With regards to file renames, all that has been shown so far is that
provenance matters for commit renames. Nothing about the similarities
between the commit parent and rename situations you mention leads me
to concluded that because provenance is important to one it is
important to the other.

Indeed, one of the arguments against provenance being important in the
file rename case is that generally we can determine this information
from the existing information, as opposed to the general commit parent
case. There are additional arguments, such as simply recording file
name changes doesn't capture many situations we would like to know
about, for example when a single file is split into two files.
Tracking the content of those files, and hence being able to deduce
where their content came from, solves this and the general rename
situation. Trying to guess which file was 'renamed' and which is 'new'
when a file is actually split into two new files would lead to
misleading and incomplete information in the end.

So just because provenance matters in some situations doesn't mean it
matters in all (at least in the way we have been applying 'matters'),
furthermore there are additional reasons why the existing
content-tracking system is beneficial. Extra layers of rename encoding
or the 'heritage of data chunks' would be extra work with little added
benefit (though there are a few corner cases, from memory, where
automatic rename detection fails and so /some/ benefit would be seen).

Regards,

Andrew Ardill
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]