Re: git clone, hardlinks and multiple users?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/20/2012 11:31 AM, Marc Herbert wrote:
Hi,

"git clone" is using hardlinks by default, even when cloning from a
different user. In such a case the clone ends up with a number of
files owned by someone else.

(I assume your using linux.) It sounds like you specified a url syntax
of /path/to/repo.git in your git-clone which tells git to use hardlinks.
 If you want your own copies then specify file:///path/to/repo.git in
git-clone (see git-clone manpage section "GIT URLS":
http://schacon.github.com/git/git-clone.html).

Since only immutable objects are cloned this seems to work fine.
However I would like to know if this "multiple users" case works by
chance or by specification.

(I'm not an expert on hardlinks, linux metadata, or git, and haven't
used hardlinks at all with linux or git yet, but do have some experience
with git and permissions.)  I think if you plan your permissions to be
based on a primary group then it will "just work".  If its not as simple
as a single primary group, then read on for my non-expert conversational
input, or at least skim thru for pointers to the reliable manpage
references...

It sounds like part of your question may actually be a hardlink
question so perhaps this info on hardlinks is useful:
http://linfo.org/hard_link.html to you. In regards to git, it does not
track metadata.  However, it will track
"permissions" if you tell it to, but even then it only tracks the
executable bit to determine if its stored in the git repo as executable
or non-executable.  If you are "changing" the metadata because you
modified the file contents (or executable bit) then
you are creating a new object (in git) and not modifying the original
hardlinked object (in git or linux) or its metadata (in linux).  I
assume the working-tree (ie., WORKTREE/ of WORKTREE/.git repo) of the
clone is indeed a full copy of the files via git-checkout because the
manpage only claims to use hardlinks for the object store (ie.
.git/objects/) to save diskspace on the clone of the object store, not
the checkout of the worktree.  Worktree objects only get written
to the object store if you stage them to the index (git-add).  Then they
are stored in .git/objects/ according to the sha-1 of their
contents.  Therefore, if your worktree copy has a different owner and
you don't modify the contents or executable bit then you can't possibly
stage it because git does not detect a difference in content or
executable bit.  On the other hand, if you change the contents or the
executable bit then git will consider that a change and update the
object store, but it will be a new object and not the object
representing the previous version you hardlinked to when you cloned.  If
that new object is then in turn pushed to the origin repo and someone
else clones it using hardlinks then they may very well not
be able to access that object if its owner:group excludes them.  More
likely, if someone pushes an object with bad permissions then others
will get push errors because git stores objects in subdirs named after
the first two chars of the sha-1 which means other objects in that
subdir will also be inaccessible.  If you change permissions in regard
to executable bit on your files without editing contents then I don't
know if git will make a new copy or modify the original inode because
I'm not sure if the executable bit permissions is represented in the
sha-1 contents or not.  In the git-init manpage there are options for
permissions/sharing under the --shared option (not to be confused with
the --shared option of git-clone which it totally different).  The
git-clone equivalent appears to be "git-clone --config
core.sharedRepository=<your-value>".  Maybe these core.sharedRepository
settings in git are smart enough to handle the hardlink shared inode
metadata confusion.

In other words, is there a guarantee that no later version of git or
 no obscure option I haven't used yet will ever try to touch a
hardlink in any way like for instance: trying update some metadata
timestamp or, overwrite it with the same value by lack of
optimization, or any other kind of side-effect that would obviously
fail.

However, if you cd to .git/objects/ and use chmod to change the
permission directly then I think it would change the permissions on the
inodes your origin is storing as loose objects.  I'm not sure what it
would do for packed objects. There are clone options like --shared and --reference that have special notes on the manpage explaining how you could break things if you don't know what you're doing (that would include hardlinks but is not exclusive to hardlinks).

Hope this helps in some way. Perhaps someone better informed will provide a more accurate and/or clear answer. Let me know what
you find out because I too will have to become more concerned about
diskspace and clone optimization in the very near future.

v/r,
neal
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]