On Thu, Jan 6, 2011 at 10:34 AM, Maaartin-1 <grajcar1@xxxxxxxxx> wrote: > In theory, I could create many commits per seconds. I could create many > unique paths per seconds, too. But I don't think it really happens. I do > know no larger repository than git.git and I don't want to download it > just to see how many commits, paths, and object it contains, but I'd > suppose it's less than one million commits, which should be manageable, > especially when commits get grouped together as I described below. In pratice, commits are created every day in an active project. Paths on the other hand are added less often (perhaps except webkit). I've got some numbers: - wine.git has 72k commits, 260k trees, 200k blobs, 12k paths - git.git has 24k commits, 39k trees, 24k blobs, 2.7k paths - linux-2.6.git has 160k commits, 760k trees, 442k blobs, 46k paths Large repos are more interesting because small ones can be cloned with git-clone. Listing all those commits in linux-2.6.git takes 160k*20=3M (I suppose compressing is useless because SHA-1 is random). A compressed listing of those 46k paths takes 200k. >> And commits depend on other commits so >> you can't verify a commit until you have got all of its parents. That >> does apply to file, but then this file chain does not interfere other >> file chains. > > That's true, but the verification is something done locally on the > client, it consumes no network traffic and no server resources, so I > consider it to be cheap. I need less than half a minute (using only a > single core) for verifying of the whole git.git repository (36 MB). This > is no problem, even when it had to wait until the download finishes. I'm > sure, the OP of [1] would be happy if he could wait for this. The point is you need to fetch its parent commits first in order to verify a commit. Fetching a whole commit is more expensive than a file. So while you can fetch a few commit bases and request for packs from those bases in parallel, the cost of initial commit bases will be high. > I see I didn't explain it clear enough (or am missing something > completely). I know why the packs normally used by git can't be used for > this purpose. Let me retry: Let's assume there's a commit chain > A-B-C-D-E-F-..., the client has already commit B and requests commit F. > It may send requests to up to 4 servers, asking for C, D, E, and F, > respectively. The server being asked for E _creates_ a pack containing > all the information needed to create E given _all of_ A, B, C, D. As > base for any blob/whatever in E it may choose any blob contained in any > of these commits. Of course, it may also choose a blob already packed in > this pack. It may not choose any other blob, so any client having all > ancestors of E can use the pack. Different server and/or program > versions may create different packs for E, but all of them are > _interchangeable_. Because of this, it makes sense to _store_ it for > future reuse. They are interchangeable as a whole, yes. But you cannot fetch half the pack from server A and the other half from server B. You can try to recover as many deltas as possible in a broken pack, but how do you request a server to send the rest of the pack to you? -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html