Re: looking up object types quickly, was Re: [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference when merging

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Mar 22, 2020 at 02:45:19PM -0400, Jeff King wrote:

> 3. To find the type of a delta object, we have to walk back to a base
>    object with a concrete type. When we access the actual object
>    contents, we cache intermediate bases to avoid digging down the same
>    chain over and over. But I don't think there's any such cache for the
>    type lookup. So if you have a 50-deep delta chain and want the type
>    of each object, you'll walk down 49 links, then 48 links, and so on.
>    It probably wouldn't be that hard to have a small type cache (or just
>    allow the existing cache to have entries for "I know the type, but
>    don't have the contents").
> 
> I suspect (1) would give the biggest speedup, but is more work and
> requires bitmaps. I think (3) is likely to give a moderate win and
> probably isn't that big a patch.

Sadly, it doesn't seem to help. Here's a hacky implementation (I tested
it only on my limited cat-file case; I wouldn't be at all surprised if
it causes problems for users of the delta cache that want the actual
object contents).

After some rudimentary profiling (which I _should_ have done in the
first place before opening my mouth), it looks like finding the types
isn't actually the most expensive thing anyway: it's just dealing with
the sheer number of objects in the pack.

That implies that teaching packed_object_info() to use bitmaps to find
the individual types wouldn't help much either (but being able to ask
"just give me the commits" and iterating over the commit-type bitmap
probably _would_).

Anyway, here's the patch for the curious.

---
diff --git a/packfile.c b/packfile.c
index f4e752996d..482733689a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1273,6 +1273,13 @@ static int retry_bad_packed_offset(struct repository *r,
 
 #define POI_STACK_PREALLOC 64
 
+/* some reordering of static functions could avoid these */
+static enum object_type get_delta_base_cache_type(struct packed_git *p,
+						  off_t curpos);
+static void add_delta_base_cache(struct packed_git *p, off_t base_offset,
+				 void *base, unsigned long base_size,
+				 enum object_type type);
+
 static enum object_type packed_to_object_type(struct repository *r,
 					      struct packed_git *p,
 					      off_t obj_offset,
@@ -1287,6 +1294,14 @@ static enum object_type packed_to_object_type(struct repository *r,
 	while (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) {
 		off_t base_offset;
 		unsigned long size;
+		enum object_type cached_type;
+
+		cached_type = get_delta_base_cache_type(p, obj_offset);
+		if (cached_type > 0) {
+			type = cached_type;
+			goto out;
+		}
+
 		/* Push the object we're going to leave behind */
 		if (poi_stack_nr >= poi_stack_alloc && poi_stack == small_poi_stack) {
 			poi_stack_alloc = alloc_nr(poi_stack_nr);
@@ -1326,6 +1341,11 @@ static enum object_type packed_to_object_type(struct repository *r,
 	}
 
 out:
+	if (type != OBJ_BAD) {
+		size_t i;
+		for (i = 0; i < poi_stack_nr; i++)
+			add_delta_base_cache(p, poi_stack[i], NULL, 0, type);
+	}
 	if (poi_stack != small_poi_stack)
 		free(poi_stack);
 	return type;
@@ -1385,6 +1405,13 @@ get_delta_base_cache_entry(struct packed_git *p, off_t base_offset)
 	return e ? container_of(e, struct delta_base_cache_entry, ent) : NULL;
 }
 
+static enum object_type get_delta_base_cache_type(struct packed_git *p, off_t curpos)
+{
+	struct delta_base_cache_entry *ent =
+		get_delta_base_cache_entry(p, curpos);
+	return ent ? ent->type : OBJ_BAD;
+}
+
 static int delta_base_cache_key_eq(const struct delta_base_cache_key *a,
 				   const struct delta_base_cache_key *b)
 {
@@ -1433,7 +1460,7 @@ static void *cache_or_unpack_entry(struct repository *r, struct packed_git *p,
 	struct delta_base_cache_entry *ent;
 
 	ent = get_delta_base_cache_entry(p, base_offset);
-	if (!ent)
+	if (!ent || !ent->data)
 		return unpack_entry(r, p, base_offset, type, base_size);
 
 	if (type)
@@ -1677,7 +1704,7 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
 		struct delta_base_cache_entry *ent;
 
 		ent = get_delta_base_cache_entry(p, curpos);
-		if (ent) {
+		if (ent && ent->data) {
 			type = ent->type;
 			data = ent->data;
 			size = ent->size;



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux