On Sun, Jun 18, 2017 at 12:58:37PM +0200, René Scharfe wrote: > An oid_array spends ca. 30 bytes per entry (20 bytes for a hash times > a factor of 1.5 from alloc_nr). How many loose objects do we have to > expect? 30 MB for a million of them sounds not too bad, but can there > realistically be orders of magnitude more? Good point. We gc by default at 6000 loose objects, and lots of thing will suffer poor performance by the time you hit a million. So a little extra space is probably not worth worrying about. > So here's a patch for caching loose object hashes in an oid_array > without a way to invalidate or release the cache. It uses a single > oid_array for simplicity, but reads each subdirectory individually and > remembers that fact using a bitmap. I like the direction of this very much. > @@ -1586,6 +1587,9 @@ extern struct alternate_object_database { > struct strbuf scratch; > size_t base_len; > > + uint32_t loose_objects_subdir_bitmap[8]; Is it worth the complexity of having an actual bitmap and not just an array of char? I guess it's not _that_ complex to access the bits, but there are a lot of magic numbers involved. > + struct oid_array loose_objects_cache; We should probably insert a comment here warning that the cache may go out of date during the process run, and should only be used when that's an acceptable tradeoff. > +static void read_loose_object_subdir(struct alternate_object_database *alt, > + int subdir_nr) I think it's nice to pull this out into a helper function. I do wonder if it should just be reusing for_each_loose_file_in_objdir(). You'd just need to expose the in_obj_subdir() variant publicly. It does do slightly more than we need (for the callbacks it actually builds the filename), but I doubt memcpy()ing a few bytes would be measurable. > + pos = oid_array_lookup(&alt->loose_objects_cache, &ds->bin_pfx); > + if (pos < 0) > + pos = -1 - pos; > + while (!ds->ambiguous && pos < alt->loose_objects_cache.nr) { > + const struct object_id *oid; > + oid = alt->loose_objects_cache.oid + pos; > + if (!match_sha(ds->len, ds->bin_pfx.hash, oid->hash)) > + break; > + update_candidates(ds, oid); > + pos++; > } This part looks quite nice and straightforward. -Peff