On Sun, Aug 26, 2007, Jeff King wrote: > On Sat, Aug 25, 2007 at 11:44:07AM -0400, Jon Smirl wrote: > >> A very simple solution is to sendfile() existing packs if they contain >> any objects that the client wants and let the client deal with the >> unwanted objects. Yes this does send extra traffic over the net, but >> the only group significantly impacted is #2 which is the most >> infrequent group. >> >> Loose objects are handled as they are currently. To optimize this >> scheme you need to let the loose objects build up at the server and >> then periodically sweep only the older ones into a pack. Packing the >> entire repo into a single pack would cause recent fetches to retrieve >> the entire pack. > > I was about to write "but then 'fetch recent' clients will have to get > the entire repo after the upstream does a 'git-repack -a -d'" but you > seem to have figured that out already. > > I'm unclear: are you proposing new behavior for git-daemon in general, > or a special mode for resource-constrained servers? If general behavior, > are you suggesting that we never use 'git-repack -a' on repos which > might be cloned? I think that "reuse existing packs if sensible" idea (instead of generating always new pack) is a good one, even if at first limited to the clone case. There are nevertheless a few complications. 1. When discussing this idea on git mailing list some time ago somebody said that we don't need to implement "multi pack" extension (which was at the beginning in the design, to add later, if I understand correctly), it is enough to concatenate packs. The receiving side can then detect boundaries between packs and split them appropriately. But is a concatenated a proper pack? If not, then we can send concatenation of packs only if the client (receiving side) understands it, and can split it; it means checking for protocol extension... 2. How to detect that request is for a clone? git-clone is get all remote heads and fetch from just received heads. But because fecthing refs and fetching objects is separate, we cannot I think use this sequence for detecting that we want a clone. We can use "no haves" as heuristic to detect a clone request, but "no haves" occurs also for initial fetching of single branch (i.e. using: git-remote; git-fetch sequence instead of git-clone). 3. The problem with alternates mentioned by Linus is not much a problem, as we can simply consider packs from the alternate repository/repositories. For example if we use single alternate, we would send concatenation of packs from this repository, and from alternate (and pack of loose objects from this repository). We would probably want to have some heuristic (besides configuring git-daemon) to choose between reusing existing packs (and sending them concatenated), and generating a pack for sending. Note that for dumb transports we have the opposite problem and opposite idea: we always send full packs for dumb transports; the idea was to use range downloading (available at least for http and ftp protocols) to download only needed fragments of packs. Perhaps if some % of pack (number of objects in the pack or size of pack) is to be send then we reuse the pack, and remove objects in the pack from consideration. No idea of how to implement that, though. Or if number of objects in pack to be send crosses some threshold, or generating pack/doing reachability analysis takes to loong, then reuse existing packs. Or you can wait fro the GitTorrent protocol to be implemented, or implement it yourself... ;-) -- Jakub Narebski Poland - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html