On Sun, Mar 22, 2020 at 02:45:19PM -0400, Jeff King wrote: > 3. To find the type of a delta object, we have to walk back to a base > object with a concrete type. When we access the actual object > contents, we cache intermediate bases to avoid digging down the same > chain over and over. But I don't think there's any such cache for the > type lookup. So if you have a 50-deep delta chain and want the type > of each object, you'll walk down 49 links, then 48 links, and so on. > It probably wouldn't be that hard to have a small type cache (or just > allow the existing cache to have entries for "I know the type, but > don't have the contents"). > > I suspect (1) would give the biggest speedup, but is more work and > requires bitmaps. I think (3) is likely to give a moderate win and > probably isn't that big a patch. Sadly, it doesn't seem to help. Here's a hacky implementation (I tested it only on my limited cat-file case; I wouldn't be at all surprised if it causes problems for users of the delta cache that want the actual object contents). After some rudimentary profiling (which I _should_ have done in the first place before opening my mouth), it looks like finding the types isn't actually the most expensive thing anyway: it's just dealing with the sheer number of objects in the pack. That implies that teaching packed_object_info() to use bitmaps to find the individual types wouldn't help much either (but being able to ask "just give me the commits" and iterating over the commit-type bitmap probably _would_). Anyway, here's the patch for the curious. --- diff --git a/packfile.c b/packfile.c index f4e752996d..482733689a 100644 --- a/packfile.c +++ b/packfile.c @@ -1273,6 +1273,13 @@ static int retry_bad_packed_offset(struct repository *r, #define POI_STACK_PREALLOC 64 +/* some reordering of static functions could avoid these */ +static enum object_type get_delta_base_cache_type(struct packed_git *p, + off_t curpos); +static void add_delta_base_cache(struct packed_git *p, off_t base_offset, + void *base, unsigned long base_size, + enum object_type type); + static enum object_type packed_to_object_type(struct repository *r, struct packed_git *p, off_t obj_offset, @@ -1287,6 +1294,14 @@ static enum object_type packed_to_object_type(struct repository *r, while (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) { off_t base_offset; unsigned long size; + enum object_type cached_type; + + cached_type = get_delta_base_cache_type(p, obj_offset); + if (cached_type > 0) { + type = cached_type; + goto out; + } + /* Push the object we're going to leave behind */ if (poi_stack_nr >= poi_stack_alloc && poi_stack == small_poi_stack) { poi_stack_alloc = alloc_nr(poi_stack_nr); @@ -1326,6 +1341,11 @@ static enum object_type packed_to_object_type(struct repository *r, } out: + if (type != OBJ_BAD) { + size_t i; + for (i = 0; i < poi_stack_nr; i++) + add_delta_base_cache(p, poi_stack[i], NULL, 0, type); + } if (poi_stack != small_poi_stack) free(poi_stack); return type; @@ -1385,6 +1405,13 @@ get_delta_base_cache_entry(struct packed_git *p, off_t base_offset) return e ? container_of(e, struct delta_base_cache_entry, ent) : NULL; } +static enum object_type get_delta_base_cache_type(struct packed_git *p, off_t curpos) +{ + struct delta_base_cache_entry *ent = + get_delta_base_cache_entry(p, curpos); + return ent ? ent->type : OBJ_BAD; +} + static int delta_base_cache_key_eq(const struct delta_base_cache_key *a, const struct delta_base_cache_key *b) { @@ -1433,7 +1460,7 @@ static void *cache_or_unpack_entry(struct repository *r, struct packed_git *p, struct delta_base_cache_entry *ent; ent = get_delta_base_cache_entry(p, base_offset); - if (!ent) + if (!ent || !ent->data) return unpack_entry(r, p, base_offset, type, base_size); if (type) @@ -1677,7 +1704,7 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset, struct delta_base_cache_entry *ent; ent = get_delta_base_cache_entry(p, curpos); - if (ent) { + if (ent && ent->data) { type = ent->type; data = ent->data; size = ent->size;