Re: A Basic Git Question About File Tracking [ANSWERED]

Jakub Narebski <jnareb@xxxxxxxxx> · Sun, 9 Oct 2011 11:37:33 +0200

On Sun, 9 Oct 2011, Jon Forrest wrote:
> On 10/8/2011 6:17 PM, Jakub Narebski wrote:
> 
> > You seem to be under [false] impression that git commit is about
> > _changes_ / _changeset_.
> 
> This is correct. The Pro Git book says:
> 
> "You stage these modified files and then commit
> all your staged changes"
> 
> Plus, even "git status" tells me
> 
> $ git status
> # On branch master
> # Changes to be committed:

Well, that is because the two representations: delta / changeset
("differential") representation and snapshot ("integral") representation
are related, and [in practice] one can be transformed into the other.

Sometimes it is better to think about commit as representing changeset
from parent commit, sometimes it is better to think of a commit as of
snapshot of a state of project.

But under the hood git model is snapshot-based.

> But I see my error. Below is what I hope is a clear
> explanation of what I didn't understand. It presumes
> that the reader understands the git objects model.
> Please let me know if anything is incorrect.
> ----------
> When you "git add" a file two things happen:
> 
> 1) The file is copied to the git objects tree.

Actually it is file _contents_ that is copied to git object _store_.

> This location where the file is copied depends
> on the hash of the file's content.

I'd say that this is unnecessary implementation detail of "loose"
object format.  I would say that _identifier_ of added object is
based on its contents.

> 
> 2) An entry for the file is added to the git index.
> This entry includes the same hash that was mentioned
> in #1.

Yes.

> A tracked file has an entry in the git index file.

Yes.

> A copy of the file also exists in the objects tree.

A copy of a _contents_ of a file at specific point of time
exists in object _store_ (not necessary object tree, as it
can be packed).

> When you run 'git status', git computes the hash of
> every file in your working directory and looks
> up each file in the index. If the file isn't found
> then the file is shown as untracked.

Sidenote: git stores in the index also stats of a file (modification
time etc.) so it is possible to avoid recomputing the hash of every 
file.

> When you do a commit, the hash values of everything
> in the index are copied into a tree object. The hash
> value of the tree object is then placed in a commit object.

True, though I would probably state it a bit differently.

> No copies of tracked files in the working directory are
> made at commit time. This is because the files were already
> copied into the objects tree when 'git add' was run.
> This is one reason why git commits are so fast.

Well, there is also "git commit -a", but it is true that git
copies into object store only those tracked files that changed.

Also I think that the main reason that git commits are fast is
that they are local operation, and not over the network as in the
case of centralized version control systems.

-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html