On Mon, 23 Feb 2009, Shawn O. Pearce wrote: > Jakub Narebski <jnareb@xxxxxxxxx> wrote: >> Nicolas Pitre <nico@xxxxxxx> writes: >>> On Sun, 22 Feb 2009, Miklos Vajna wrote: >>>> >>>> http://thread.gmane.org/gmane.comp.version-control.git/55254/focus=55298 >>>> >>>> Especially Shawn's message, which can be a base for your proposal, if >>>> you want to work in this. >>> >>> I don't particularly agree with Shawn's proposal. Reliance on a stable >>> sorting on the server side is too fragile, restrictive and cumbersome. > > We already rely on a stable sort in the tree format. [...] I (and Nicolas) by 'sorting order' mean here ordering of objects and deltas in the pack file, i.e. whether we get _exactly_ the same (byte for byte) packfile for the same want/have exchange (your proposal), or even for the same arguments to git-pack-objects (which is a necessary, although I think not sufficient condition). [...] >> I think it is possible for dumb protocols (using commit walkers) and >> for (deprecated) rsync. > > Yes, it is possible for the commit walkers to implement a restart, > as they are actually beginning at the current root and walking back > in history. Resuming a large file like a pack is easy to do on HTTP > if the remote server supports byte range serving. Its also easy > to validate on the client that the pack wasn't repacked during the > idle period (between initial fetch and restart), just validate the > SHA-1 footer. If the pack was repacked and came up with the same > name you'll have a mismatch on the footer. Discard and try again. Can we assume that packfiles are named correctly (i.e. name of packfile match SHA-1 footer)? > > And if you want to save bandwidth, always grab the last 20 bytes > of the file before getting any other parts, save it somewhere, > and revalidate that last 20 before resuming. If its changed, > you should discard what you have and start over from the beginning. Therefore I think that restartable clone for "dumb" (commit walker) protocols is easy GSoC project, while restartable clone for "smart" (generate packfile) protocols is at least of medium difficulty, and might be harder. >>> I think restartable clone is a really bad suggestion for SOC students. >>> After all we want successful SOC projects, not ones that even core git >>> developers did not yet find a good solution for. >>> >>> IMHO of course. >> >> But I agree that within current limits (as far as I know there are no >> way to ask for SHA-1; you can only ask for refs for security reasons) >> it would be difficult to very difficult to add restartable clone >> support to native (smart) protocols. >> >> If not for this limitation it would be, I think, possible to do a kind >> of fsck, checking which commits in packfile are complete (i.e. have >> all objects), and based on that ask for subset of objects. This would >> require support only from a client... alas, this is not possible. > > I think the current "must want advertised ref" restriction is > too strict. If you make the server check the reachability of the > wanted object, (assuming it can be resolved to a commit) then you > can pick up in the middle of history. We already (to some extent) > support that with the deepen thing in a shallow clone. Sure, it > may cause more server load when clients ask for this partial fetch. Hmmm... I forgot about shallow clone. Still, we can have the following situation: *---*---o---.---.---. .... .---o---*---* <-- some ref ^ ^ | | a b where '*' means that we have commit and all its object fully in packfile (i.e. if they are delta, there is base for delta in packfile), 'o' means incomplete, for example commit with some o blobs missing, and '.' means missing commit object. Because git deals with continuous range, we can tell on restart of clone that we have 'a', and that we want 'b', but without further extensions to git protocols, where we can tell that we have some objects (to exclude), but not assume anything about their requirements; something that if I remember correctly was implemented in some floating 'lazy clone' patch (well, lazy loading of blobs patch)... [...] > So, IMHO, the restriction that a commit must be advertised, and not > merely reachable, is overly strict and doesn't buy us a whole lot. > >> I think that unless 'restartable clone' is limited to commit wakers >> (HTP protocol etc.) it should be moved up the diffuculty from "New to >> Git?" section. I guess that mirror-sync, formerly GitTorrent, could be >> easier to implement. > > Maybe. But a simple stable sort on the objects makes it easier, > perhaps within reach of "new to git". As Nico said in the presence of threaded packing ordering of _objects_ on _packfile_ might be not deterministic. > > That ideas page is a wiki for a reason. If folks feel differently > from me, please edit it to improve things! :-) I'll try to add 'pack file cache for git-daemon' proposal to GSoC2009Ideas page... but I cannot be mentor (or even co-mentor) for this idea. -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html