On Mon, Aug 13, 2012 at 2:49 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > For example, the reachability bitmap would want to say something > like "Traversing from commit A, these objects in this pack are > reachable." The bitmap for one commit A would logically consist of > N bits for a packfile that stores N objects (the resulting bitmap > needs to be compressed before going to disk, perhaps with RLE or > something). With the single "sorted by SHA-1" table, we can use the > index in that single table to enumerate all reachable objects of any > type in one go. With four separate tables, on the other hand, we > would need four bitmaps per commit. No we still need one per commit. The n-th bit is in the order of the object in the pack, not the index. How sha-1 is sorted does not matter. > Either way is _possible_, but I think the former is simpler, and the > latter makes it harder to introduce new types of objects in the > future, which I do not think we have examined possible use cases > well enough to make that decision to say "four types is enough > forever". New types can be put in one of those four tables, depending on its purpose. The reason I split because I care particularly about commits and trees. If the new type serves the same purpose as tree, for example, then it's better put in tree table... > In either way, we would have such bitmap (or a set of four bitmaps > in your case) for more than one commit (it is not necessary or > desirable to add the reachability bitmap to all commits), and such a > "reachability extension" would need to store a sequence of "the > commit object name the bitmap (or a set of four bitmaps) is about, > and the bitmap (or set of four bitmaps)". That object name does not > have to be 20-byte but would be a varint representation of the > offset into the "sorted by SHA-1" table. How do you reach the bitmap, given its commit sha-1? > That varint representation > would be smaller by about 3.5 bits if you have a separate "commit > only, sorted by SHA-1" table (as the number of all objects tend to > be 10x larger than the number of all commits that need them). For > the particular case of "we want to only annotate the commits, never > other kinds of objects" use case, it would be a win. But without > knowing what other use cases we will want to use the "object > annotation in the pack index file" mechanism for, it feels like a > premature optimization to me to have four tables to shave 3.5 bits > per object. caching trees for faster traversal in general case (sort of pack v4 but it comes as a cache instead of replacing the real pack). -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html