Re: [PATCH/RFC 0/6] reuse deltas found by bitmaps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 26, 2014 at 02:13:00PM -0400, Jeff King wrote:

> So I think the next steps are probably:
> 
>   1. Measure the "all objects are preferred bases" approach and confirm
>      that it is bad.

Below is a very rough patch to accomplish this. It just traverses the
"have" bitmap and adds every object with the "exclude" flag. The result
is as comically bad as I expected.

For a big fetch, it seems like it's working (numbers against v1.9.0):

  5311.31: server (128 days)   4.49(7.35+0.23)   4.98(6.82+3.31) +10.9%
  5311.32: size   (128 days)             25.8M             32.0M +24.2%
  5311.33: client (128 days)   7.17(7.38+0.20)   7.33(7.90+0.20) +2.2%

A modest increase in CPU time, and we get back most of our size
(remember that our "bad" case here is ~80M).

But for a small fetch...

  5311.3: server   (1 days)    0.20(0.17+0.03)   4.39(4.03+6.59) +2095.0%
  5311.4: size     (1 days)              57.2K             59.5K +4.1%
  5311.5: client   (1 days)    0.08(0.08+0.00)   0.08(0.08+0.00) +0.0%

Yikes. Besides spending lots of CPU on handling the enlarged object
list, notice that we still only dropped the size in the 128-day case to
32M. Which is almost exactly what the earlier "reuse on-disk deltas"
patch achieved.

What I think is happening is that we manage to reuse those on-disk
deltas (since they are now preferred bases, and we know we can). But we
never actually come up with any _new_ deltas, because the search window
is overwhelmed with candidates. So not only do we waste a huge amount of
CPU, but we just end up at the same not-quite-optimal result as before.

So this is a dead end, but I think it was good to double-check that.

The patch below is messy and would probably be better split into a few
patches, but I don't expect anyone to apply it (or even read it,
really). It's just for reference.

---
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 0ee5f1f..1a5d401 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1026,7 +1026,7 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1,
 	if (have_duplicate_entry(sha1, 0, &index_pos))
 		return 0;
 
-	create_object_entry(sha1, type, name_hash, 0, 0, index_pos, pack, offset);
+	create_object_entry(sha1, type, name_hash, flags, 0, index_pos, pack, offset);
 
 	display_progress(progress_state, to_pack.nr_objects);
 	return 1;
@@ -2436,6 +2436,7 @@ static int get_object_list_from_bitmap(struct rev_info *revs)
 	}
 
 	traverse_bitmap_commit_list(&add_object_entry_from_bitmap);
+	bitmap_have_foreach(&add_object_entry_from_bitmap);
 	return 0;
 }
 
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 1bae7e8..f4e30f5 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -605,6 +605,7 @@ static void show_objects_for_type(
 	struct bitmap *objects,
 	struct ewah_bitmap *type_filter,
 	enum object_type object_type,
+	int flags,
 	show_reachable_fn show_reach)
 {
 	size_t pos = 0, i = 0;
@@ -613,9 +614,6 @@ static void show_objects_for_type(
 	struct ewah_iterator it;
 	eword_t filter;
 
-	if (bitmap_git.reuse_objects == bitmap_git.pack->num_objects)
-		return;
-
 	ewah_iterator_init(&it, type_filter);
 
 	while (i < objects->word_alloc && ewah_iterator_next(&filter, &it)) {
@@ -640,7 +638,7 @@ static void show_objects_for_type(
 			if (bitmap_git.hashes)
 				hash = ntohl(bitmap_git.hashes[entry->nr]);
 
-			show_reach(sha1, object_type, 0, hash, bitmap_git.pack, entry->offset);
+			show_reach(sha1, object_type, flags, hash, bitmap_git.pack, entry->offset);
 		}
 
 		pos += BITS_IN_WORD;
@@ -816,14 +814,17 @@ void traverse_bitmap_commit_list(show_reachable_fn show_reachable)
 {
 	assert(bitmap_git.result);
 
+	if (bitmap_git.reuse_objects == bitmap_git.pack->num_objects)
+		return;
+
 	show_objects_for_type(bitmap_git.result, bitmap_git.commits,
-		OBJ_COMMIT, show_reachable);
+		OBJ_COMMIT, 0, show_reachable);
 	show_objects_for_type(bitmap_git.result, bitmap_git.trees,
-		OBJ_TREE, show_reachable);
+		OBJ_TREE, 0, show_reachable);
 	show_objects_for_type(bitmap_git.result, bitmap_git.blobs,
-		OBJ_BLOB, show_reachable);
+		OBJ_BLOB, 0, show_reachable);
 	show_objects_for_type(bitmap_git.result, bitmap_git.tags,
-		OBJ_TAG, show_reachable);
+		OBJ_TAG, 0, show_reachable);
 
 	show_extended_objects(bitmap_git.result, show_reachable);
 
@@ -1090,3 +1091,18 @@ int bitmap_have(const unsigned char *sha1)
 
 	return bitmap_get(bitmap_git.haves, pos);
 }
+
+void bitmap_have_foreach(show_reachable_fn show_reachable)
+{
+	if (!bitmap_git.haves)
+		return;
+
+	show_objects_for_type(bitmap_git.haves, bitmap_git.commits,
+		OBJ_COMMIT, 1, show_reachable);
+	show_objects_for_type(bitmap_git.haves, bitmap_git.trees,
+		OBJ_TREE, 1, show_reachable);
+	show_objects_for_type(bitmap_git.haves, bitmap_git.blobs,
+		OBJ_BLOB, 1, show_reachable);
+	show_objects_for_type(bitmap_git.haves, bitmap_git.tags,
+		OBJ_TAG, 1, show_reachable);
+}
diff --git a/pack-bitmap.h b/pack-bitmap.h
index a63ee6b..02c08f8 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -50,6 +50,7 @@ int reuse_partial_packfile_from_bitmap(struct packed_git **packfile, uint32_t *e
 int rebuild_existing_bitmaps(struct packing_data *mapping, khash_sha1 *reused_bitmaps, int show_progress);
 
 int bitmap_have(const unsigned char *sha1);
+void bitmap_have_foreach(show_reachable_fn);
 
 void bitmap_writer_show_progress(int show);
 void bitmap_writer_set_checksum(unsigned char *sha1);
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]