Am 18.06.2017 um 13:49 schrieb Jeff King:
On Sun, Jun 18, 2017 at 12:58:37PM +0200, René Scharfe wrote:
An oid_array spends ca. 30 bytes per entry (20 bytes for a hash times
a factor of 1.5 from alloc_nr). How many loose objects do we have to
expect? 30 MB for a million of them sounds not too bad, but can there
realistically be orders of magnitude more?
Good point. We gc by default at 6000 loose objects, and lots of thing
will suffer poor performance by the time you hit a million. So a little
extra space is probably not worth worrying about.
So here's a patch for caching loose object hashes in an oid_array
without a way to invalidate or release the cache. It uses a single
oid_array for simplicity, but reads each subdirectory individually and
remembers that fact using a bitmap.
I like the direction of this very much.
@@ -1586,6 +1587,9 @@ extern struct alternate_object_database {
struct strbuf scratch;
size_t base_len;
+ uint32_t loose_objects_subdir_bitmap[8];
Is it worth the complexity of having an actual bitmap and not just an
array of char? I guess it's not _that_ complex to access the bits, but
there are a lot of magic numbers involved.
That would be 224 bytes more per alternate_object_database, and we'd
gain simpler code. Hmm. We could add some bitmap helper macros, but
you're probably right that the first version should use the simplest
form for representing small bitmaps that we currently have.
+ struct oid_array loose_objects_cache;
We should probably insert a comment here warning that the cache may go
out of date during the process run, and should only be used when that's
an acceptable tradeoff.
Then we need to offer a way for opting out. Through a global variable?
(I'd rather reduce their number, but don't see how allow programs to
nicely toggle this cache otherwise.)
+static void read_loose_object_subdir(struct alternate_object_database *alt,
+ int subdir_nr)
I think it's nice to pull this out into a helper function. I do wonder
if it should just be reusing for_each_loose_file_in_objdir(). You'd just
need to expose the in_obj_subdir() variant publicly.
It does do slightly more than we need (for the callbacks it actually
builds the filename), but I doubt memcpy()ing a few bytes would be
measurable.
Good point. The function also copies the common first two hex digits
for each entry. But all that extra work is certainly dwarfed by the
readdir calls.
René