On Wed, Oct 30, 2013 at 8:21 AM, Jeff King <peff@xxxxxxxx> wrote: > On Fri, Oct 25, 2013 at 02:14:11PM +0000, Shawn O. Pearce wrote: >> On Thu, Oct 24, 2013 at 6:04 PM, Jeff King <peff@xxxxxxxx> wrote: >> > For bitmaps to be used, the following must be true: >> > >> > 1. We must be packing to stdout (as a normal `pack-objects` from >> > `upload-pack` would do). >> > >> > 2. There must be a .bitmap index containing at least one of the >> > "have" objects that the client is asking for. >> >> The client must explicitly "have" a commit that has a bitmap? In JGit >> we allow the client to have anything, and walk backwards using >> traditional graph traversal until a bitmap is found. > > If the bitmaps contain the full set of reachable objects and the client > does not have any "haves" that are bitmapped , then we know that either: > > 1. Their "haves" are not reachable from the "wants" > > or > > 2. Their "wants" are not bitmapped, and so the slice of "haves..wants" > has no bitmaps > > Since (1) is relatively rare, I think we are using this as a proxy for > (2), so that we can do a regular walk rather than looking around for > bitmaps that don't exist. But I may be misremembering the reasoning. > Vicent? Ah. I am not sure if we do this in JGit. I think JGit's approach is to look if the have appears in a pack with bitmaps, this is a simple lookup in the .idx file and does not require expanding any data from the .bitmap file. But it wasn't my question. :-) Client sends "want B ; have E". What if E appears in the bitmapped pack, but does not itself have a bitmap? Do you walk backwards from B and switch to the bitmap algorithm when you find a commit that has a bitmap and that bitmap contains E? >> In JGit we write the to_pack list first, then the reuse pack. Our >> rationale was the to_pack list is recent objects that are newer and >> would appear first in a traditional traversal, so they should go at >> the front of the stream. This does mean if they delta compress against >> an object in that reuse_packfile slice they have to use REF_DELTA >> instead of OFS_DELTA. > > That's a good point. In our case I think we do not delta against the > reused packfile objects at all, as we simply send out the whole slice of > packfile without making an entry for each object. JGit also doesn't use the reused packfile as delta bases, because there are no object entries to shove through the delta window. So there is never any risk of a reference to a base later in the file. It also means that "thin pack" at the front of the stream is less optimally compressed. At Google we side step that by doing GC at the server very often, to try and keep the number of objects in that pack low. It might make sense to use a commit that covers the majority of the reused pack as the edge base candidate case during delta compression here, as though the client had sent us a "have" for that commit. I don't think we have tried this in JGit. It would make deltas use REF_DELTA, but the delta size has to be smaller than the REF_DELTA header to be used in the stream so its still a smaller overall transfer. >> Is this series running on github.com/torvalds/linux? Last Saturday I >> ran a live demo clone comparing github.com/torvalds/linux to a JGit >> bitmap clone and some guy heckled me because GitHub was only a few >> seconds slower. :-) > > It is. Use kernel.org if you want to make fun of someone. :) Hah. OK, so GitHub was only a few seconds slower only because my desktop is better connected to our data centers than to yours. Nicely done, this patch series really works. :) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html