Re: git-daemon on NSLU2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Sun, 26 Aug 2007, Jon Smirl wrote:
> 
> Changing git-daemon only for the initial clone case also means that
> people don't need to change the way they manage packs.

I do agree that we might want to do some special-case handling for the 
initial clone (because it *is* kind of special), but it's not necessarily 
as easy as just re-using an existing pack.

At a minimum, we'd need to have something that knows how to make a single 
pack out of several packs and some loose objects. That shouldn't be 
*hard*, but it's certainly nontrivial, especially in the presense of the 
same objects possibly being available more than once in different packs.

[ The "duplicate object" thing does actually happen: even if you use only 
  "git native" protocols, you can get duplicate objects because a file was 
  changed back to an earlier version. The incremental packs you get from 
  push/pull'ing between two repositories try to send the minimal 
  incremental changes, but the keyword here is _try_: they will 
  potentially send objects that the receiver already has, if it's not 
  obvious that the receiver has them from the "commit boundary" cases ]

Maybe the client side will handle a pack with duplicate objects perfectly 
fine, and it's not an issue. Maybe. It might even be likely (I can't think 
of anything that would obviously break). But at a minimum, it would be 
something that needs some code on the sending side, and a lot of 
verification that the end result works ok on the receiving side.

And there's actually a deeper problem: the current native protocol 
guarantees that the objects sent over are only those that are reachable. 
That matters. It matters for subtle security issues (maybe you are 
exporting some repository that was rebased, and has objects that you 
didn't *intend* to make public!), but it also matters for issues like git 
"alternates" files.

If you only ever look at a single repo, you'll never see the alternates 
issue, but if you're seriously looking at serving git repositories, I 
don't really see the "single repo" case as being at all the most common or 
interesting case. 

And if you look at something like kernel.org, the "alternates" thing is 
*much* more important than how much memory git-daemon uses! Yes, 
kernel.org would probably be much happier if git-daemon wasn't such a 
memory pig occasionally, but on the other hand, the win from using 
alternates and being able to share 99% of all objects in all the various 
related kernel repositories is actually likely to be a *bigger* memory win 
than any git-daemon memory usage, because now the disk caching works a 
hell of a lot better!

So it's not actually clear how the initial clone thing can be optimized on 
the server side.

It's easier to optimize on the *client* side: just do the initial clone 
with rsync/http (and "git gc" it on the client afterwards), and then 
change it to the git native protocol after the clone.

That may not sound very user-friendly, but let's face it, I think there is 
exactly one person in the whole universe that tries to use an NSLU2 as a 
git server. So the "client-side workaround" is likely to affect a very 
limited number of clients ;)

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux