Am 03.04.20 um 14:15 schrieb Jeff King: > We use a custom hash in fast-import to store the set of objects we've > imported so far. It has a fixed set of 2^16 buckets and chains any > collisions with a linked list. As the number of objects grows larger > than that, the load factor increases and we degrade to O(n) lookups and > O(n^2) insertions. > > We can scale better by using our hashmap.c implementation, which will > resize the bucket count as we grow. This does incur an extra memory cost > of 8 bytes per object, as hashmap stores the integer hash value for each > entry in its hashmap_entry struct (which we really don't care about > here, because we're just reusing the embedded object hash). But I think > the numbers below justify this (and our per-object memory cost is > already much higher). > > I also looked at using khash, but it seemed to perform slightly worse > than hashmap at all sizes, and worse even than the existing code for > small sizes. It's also awkward to use here, because we want to look up a > "struct object_entry" from a "struct object_id", and it doesn't handle > mismatched keys as well. Making a mapping of object_id to object_entry > would be more natural, but that would require pulling the embedded oid > out of the object_entry or incurring an extra 32 bytes per object. > > In a synthetic test creating as many cheap, tiny objects as possible > > perl -e ' > my $bits = shift; > my $nr = 2**$bits; > > for (my $i = 0; $i < $nr; $i++) { > print "blob\n"; > print "data 4\n"; > print pack("N", $i); > } > ' $bits | git fast-import > > I got these results: > > nr_objects master khash hashmap > 2^20 0m4.317s 0m5.109s 0m3.890s > 2^21 0m10.204s 0m9.702s 0m7.933s > 2^22 0m27.159s 0m17.911s 0m16.751s > 2^23 1m19.038s 0m35.080s 0m31.963s > 2^24 4m18.766s 1m10.233s 1m6.793s I get similar numbers. > > which points to hashmap as the winner. We didn't have any perf tests for > fast-export or fast-import, so I added one as a more real-world case. > It uses an export without blobs since that's significantly cheaper than > a full one, but still is an interesting case people might use (e.g., for > rewriting history). It will emphasize this change in some ways (as a > percentage we spend more time making objects and less shuffling blob > bytes around) and less in others (the total object count is lower). > > Here are the results for linux.git: > > Test HEAD^ HEAD > ---------------------------------------------------------------------------- > 9300.1: export (no-blobs) 67.64(66.96+0.67) 67.81(67.06+0.75) +0.3% > 9300.2: import (no-blobs) 284.04(283.34+0.69) 198.09(196.01+0.92) -30.3% My numbers look a bit different for this, not sure why: Test origin/master HEAD --------------------------------------------------------------------------- 9300.1: export (no-blobs) 69.36(66.44+1.56) 67.89(66.07+1.56) -2.1% 9300.2: import (no-blobs) 295.10(293.83+1.19) 283.83(282.91+0.91) -3.8% They are still in favor of the patch, just not as strongly as yours. > > It only has ~5.2M commits and trees, so this is a larger effect than I > expected (the 2^23 case above only improved by 50s or so, but here we > gained almost 90s). This is probably due to actually performing more > object lookups in a real import with trees and commits, as opposed to > just dumping a bunch of blobs into a pack. > > Signed-off-by: Jeff King <peff@xxxxxxxx> > --- > fast-import.c | 62 +++++++++++++++++++----------- > t/perf/p9300-fast-import-export.sh | 23 +++++++++++ > 2 files changed, 62 insertions(+), 23 deletions(-) > create mode 100755 t/perf/p9300-fast-import-export.sh > I have to warn you up front: I'm not familiar with hashmap or fast-import, so I'll focus on stylistic topics -- bikeshedding. The actual change looks reasonable to me overall and the performance is convincing. > diff --git a/fast-import.c b/fast-import.c > index 202dda11a6..0ef6defc10 100644 > --- a/fast-import.c > +++ b/fast-import.c > @@ -39,12 +39,28 @@ > > struct object_entry { > struct pack_idx_entry idx; > - struct object_entry *next; > + struct hashmap_entry ent; > uint32_t type : TYPE_BITS, > pack_id : PACK_ID_BITS, > depth : DEPTH_BITS; > }; > > +static int object_entry_hashcmp(const void *map_data, > + const struct hashmap_entry *eptr, > + const struct hashmap_entry *entry_or_key, > + const void *keydata) > +{ > + const struct object_id *oid = keydata; > + const struct object_entry *e1, *e2; > + > + e1 = container_of(eptr, const struct object_entry, ent); > + if (oid) > + return oidcmp(&e1->idx.oid, oid); > + > + e2 = container_of(entry_or_key, const struct object_entry, ent); > + return oidcmp(&e1->idx.oid, &e2->idx.oid); Other hashmap users express this more tersely, similar to this: const struct object_entry *e1, *e2; e1 = container_of(eptr, const struct object_entry, ent); e2 = container_of(entry_or_key, const struct object_entry, ent); return oidcmp(&e1->idx.oid, keydata ? keydata : &e2->idx.oid); > +} > + > struct object_entry_pool { > struct object_entry_pool *next_pool; > struct object_entry *next_free; > @@ -178,7 +194,7 @@ static off_t pack_size; > /* Table of objects we've written. */ > static unsigned int object_entry_alloc = 5000; > static struct object_entry_pool *blocks; > -static struct object_entry *object_table[1 << 16]; > +static struct hashmap object_table; > static struct mark_set *marks; > static const char *export_marks_file; > static const char *import_marks_file; > @@ -455,44 +471,42 @@ static struct object_entry *new_object(struct object_id *oid) > > static struct object_entry *find_object(struct object_id *oid) > { > - unsigned int h = oid->hash[0] << 8 | oid->hash[1]; > - struct object_entry *e; > - for (e = object_table[h]; e; e = e->next) > - if (oideq(oid, &e->idx.oid)) > - return e; > + struct hashmap_entry lookup_entry, *e; > + > + hashmap_entry_init(&lookup_entry, oidhash(oid)); > + e = hashmap_get(&object_table, &lookup_entry, oid); > + if (e) > + return container_of(e, struct object_entry, ent); (Worth repeating:) No casting trick, yay! > return NULL; > } > > static struct object_entry *insert_object(struct object_id *oid) > { > - unsigned int h = oid->hash[0] << 8 | oid->hash[1]; > - struct object_entry *e = object_table[h]; > + struct object_entry *e; > + struct hashmap_entry lookup_entry, *hashent; > > - while (e) { > - if (oideq(oid, &e->idx.oid)) > - return e; > - e = e->next; > - } > + hashmap_entry_init(&lookup_entry, oidhash(oid)); > + hashent = hashmap_get(&object_table, &lookup_entry, oid); > + if (hashent) > + return container_of(hashent, struct object_entry, ent); That duplicates find_object()... > > e = new_object(oid); > - e->next = object_table[h]; > e->idx.offset = 0; > - object_table[h] = e; > + e->ent.hash = lookup_entry.hash; ... except for this part. Would it make sense to replace this line with a call to hashmap_entry_init()? Seems cleaner to me. It would look like this: struct object_entry *e = find_object(oid); if (e) return e; e = new_object(oid); e->idx.offset = 0; hashmap_entry_init(&e->ent, oidhash(oid)); hashmap_add(&object_table, &e->ent); return e; > + hashmap_add(&object_table, &e->ent); > return e; > } > > static void invalidate_pack_id(unsigned int id) > { > - unsigned int h; > unsigned long lu; > struct tag *t; > + struct hashmap_iter iter; > + struct object_entry *e; > > - for (h = 0; h < ARRAY_SIZE(object_table); h++) { > - struct object_entry *e; > - > - for (e = object_table[h]; e; e = e->next) > - if (e->pack_id == id) > - e->pack_id = MAX_PACK_ID; > + hashmap_for_each_entry(&object_table, &iter, e, ent) { > + if (e->pack_id == id) > + e->pack_id = MAX_PACK_ID; > } > > for (lu = 0; lu < branch_table_sz; lu++) { > @@ -3511,6 +3525,8 @@ int cmd_main(int argc, const char **argv) > avail_tree_table = xcalloc(avail_tree_table_sz, sizeof(struct avail_tree_content*)); > marks = mem_pool_calloc(&fi_mem_pool, 1, sizeof(struct mark_set)); > > + hashmap_init(&object_table, object_entry_hashcmp, NULL, 0); > + > /* > * We don't parse most options until after we've seen the set of > * "feature" lines at the start of the stream (which allows the command > diff --git a/t/perf/p9300-fast-import-export.sh b/t/perf/p9300-fast-import-export.sh > new file mode 100755 > index 0000000000..c3c743d04a > --- /dev/null > +++ b/t/perf/p9300-fast-import-export.sh > @@ -0,0 +1,23 @@ > +#!/bin/sh > + > +test_description='test fast-import and fast-export performance' > +. ./perf-lib.sh > + > +test_perf_default_repo > + > +# Use --no-data here to produce a vastly smaller export file. > +# This is much cheaper to work with but should still exercise > +# fast-import pretty well (we'll still process all commits and > +# trees, which account for 60% or more of objects in most repos). > +# > +# Use --rencode to avoid the default of aborting on non-utf8 commits, > +# which lets this test run against a wider variety of sample repos. > +test_perf 'export (no-blobs)' ' > + git fast-export --reencode=yes --no-data HEAD >export > +' > + > +test_perf 'import (no-blobs)' ' > + git fast-import --force <export > +' > + > +test_done >