On Thu, 17 May 2007, Jan Hudec wrote: > On Thu, May 17, 2007 at 10:41:37 -0400, Nicolas Pitre wrote: > > On Thu, 17 May 2007, Johannes Schindelin wrote: > > > On Wed, 16 May 2007, Nicolas Pitre wrote: > > And if you have 1) the permission and 2) the CPU power to execute such a > > cgi on the server and obviously 3) the knowledge to set it up properly, > > then why aren't you running the Git daemon in the first place? After > > all, they both boil down to running git-pack-objects and sending out the > > result. I don't think such a solution really buys much. > > Yes, it does. I had 2 accounts where I could run CGI, but not separate > server, at university while I studied and now I can get the same on friend's > server. Neither of them would probably be ok for serving larger busy git > repository, but something smaller accessed by several people is OK. I think > this is quite common for university students. > > Of course your suggestion which moves the logic to client-side is a good one, > but even the cgi with logic on server side would help in some situations. You could simply wrap git-bundle within a cgi. That is certainly easy enough. > > On the other hand, if the client does all the work and provides the > > server with a list of ranges within a pack it wants to be sent, then you > > simply have zero special setup to perform on the hosting server and you > > keep the server load down due to not running pack-objects there. That, > > at least, is different enough from the Git daemon to be worth > > considering. Not only does it provide an advantage to those who cannot > > do anything but http out of their segregated network, but it also > > provide many advantages on the server side too while the cgi approach > > doesn't. > > > > And actually finding out the list of objects the remote has that you > > don't have is not that complex. It could go as follows: > > > > 1) Fetch every .idx files the remote has. > > ... for git it's 1.2 MiB. And that definitely isn't a huge source tree. > Of course the local side could remember which indices it already saw during > previous fetch from that location and not re-fetch them. Right. The name of the pack/index plus its time stamp can be cached. If the remote doesn't repack too often then the overhead would be minimal. > > 2) From those .idx files, keep only a list of objects that are unknown > > locally. A good starting point for doing this really efficiently is > > the code for git-pack-redundant. > > > > 3) From the .idx files we got in (1), create a reverse index to get each > > object's size in the remote pack. The code to do this already exists > > in builtin-pack-objects.c. > > > > 4) With the list of missing objects from (2) along with their offset and > > size within a given pack file, fetch those objects from the remote > > server. Either perform multiple requests in parallel, or as someone > > mentioned already, provide the server with a list of ranges you want > > to be sent. > > Does the git server really have to do so much beyond that? Yes it does. The real thing perform a full object reachability walk and only the objects that are needed for the wanted branch(es) are sent in a custom pack meaning that the data transfer is really optimal. > > 5) Store the received objects as loose objects locally. If a given > > object is a delta, verify if its base is available locally, or if it > > is listed amongst those objects to be fetched from the server. If > > not, add it to the list. In most cases, delta base objects will be > > objects already listed to be fetched anyway. To greatly simplify > > things, the loose delta object type from 2 years ago could be revived > > (commit 91d7b8afc2) since a repack will get rid of them. > > > > 6 Repeat (4) and (5) until everything has been fetched. > > Unless I am really seriously missing something, there is no point in > repeating. For each pack you need to unpack a delta either: > - you have it => ok. > - you don't have it, but the server does => > but than it's already in the fetch set calculated in 2. > - you don't have it and nor does server => > the repository at server is corrupted and you can't fix it. You're right of course. Nicolas - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html