Re: Dealing with many many git repos in a /home directory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 4 Feb 2010, demerphq wrote:

> At $work we have a host where we have about 50-100 users each with
> their own private copies of the same repos. These are cloned froma
> remote via git/ssh and are not thus automatically hardlinking their
> object stores.
> 
> This is starting to take a lot of space.

You should keep a pristine copy of that common repository on that host 
and make it readable to everyone, and then ask your users to use the 
--reference argument with 'git clone' to borrow as much as possible from 
that common repository.

For those who already cloned the repository in full i.e. without the 
--reference switch, then it is possible to fix the situation simply by 
adding the full path to the common repository's .git/objects directory 
in their own .git/objects/info/alternates (create it if it doesn't 
exist) and then run 'git gc'.  That's what the --reference argument to 
the clone command does: setting up that .git/objects/info/alternates 
file.

> I was thinking it should be possible to hardlink all of the objects in
> the different repos to a canonical single copy.
> 
> Would i be correct in thinking that if i have to repos with an
> equivalent  .git/objects/../..... file in them that the files are
> necessarily identical and one can be replaced by a hardlink to the
> other?

Yes, you could do that.  However you'll save very little by doing that 
as the bulk of a repository content is normally stored into pack files, 
and those may differ from one repository to another depending on what 
exactly the pack contains.  The alternates mechanism is more powerful as 
it lets Git fetch objects from the canonical repository packed or not, 
and more importantly it avoids creating local copy of new objects if 
they already exists in that canonical copy meaning that you don't have 
to constantly search in every user's repository for potential new 
objects to hardlink.

> If this is correct then is there some tool known to the list that
> already does this?  I whipped this together:

The "tool" exists in Git already and is what I describe above.  The 
actual tool you might need is probably a script to populate that 
.git/objects/info/alternates file in all your users' repositoryes and 
maybe run ,git gc' on their behalf.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]