On Fri, Oct 11, 2019 at 1:59 AM Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote: > > I'm going to start with pack-bitmap.h, then builtin/pack-objects.c. > > > int reuse_partial_packfile_from_bitmap(struct bitmap_index *, > > struct packed_git **packfile, > > - uint32_t *entries, off_t *up_to); > > + uint32_t *entries, > > + struct bitmap **bitmap); > > Makes sense. The existing code determines if the given packfile is > suitable, and if yes, the span of the packfile to use. With this patch, > we resolve to use the given packfile, and want to compute which objects > to use, storing the computation result in an uncompressed bitmap. > (Resolving to use the given packfile is not that much different from > existing behavior, as far as I know, since we currently only consider > one packfile at most anyway.) > > So replacing the out param makes sense, although a more descriptive name > than "bitmap" would be nice. Yeah, in pack-bitmap.c this argument is called "reuse_out", so the same name could be used in pack-bitmap.c too. I changed that on my current version. > > +/* > > + * Record the offsets needed in our reused packfile chunks due to > > + * "gaps" where we omitted some objects. > > + */ > > +static struct reused_chunk { > > + off_t start; > > + off_t offset; > > +} *reused_chunks; > > +static int reused_chunks_nr; > > +static int reused_chunks_alloc; > > This makes sense - offsets may be different when we omit objects from > the packfile. I think this can be computed by calculating the number of > zero bits between the current object's index and the nth object prior > (where n is the offset) in the bitmap resulting from > reuse_partial_packfile_from_bitmap() above, thus eliminating the need > for this array, but I haven't tested it. Thanks for the idea. I will let others comment on that though before maybe trying to implement it. I think it could come as a further optimization in a following patch if it makes things faster or reduce memory usage. > > + if (0) { > > + off_t expected_size = cur - offset; > > + > > + if (len + ofs_len < expected_size) { > > + unsigned max_pad = (len >= 4) ? 9 : 5; > > + header[len - 1] |= 0x80; > > + while (len < max_pad && len + ofs_len < expected_size) > > + header[len++] = 0x80; > > + header[len - 1] &= 0x7F; > > + } > > + } > > This if(0) should be deleted. Yeah, good eyes. I removed it on my current version. > > @@ -1002,6 +1132,10 @@ static int have_duplicate_entry(const struct object_id *oid, > > { > > struct object_entry *entry; > > > > + if (reuse_packfile_bitmap && > > + bitmap_walk_contains(bitmap_git, reuse_packfile_bitmap, oid)) > > + return 1; > > Hmm...why did we previously not need to check the reuse information, but > we do now? I gave the code a cursory glance but couldn't find the > answer. > > > @@ -2571,7 +2706,9 @@ static void ll_find_deltas(struct object_entry **list, unsigned list_size, > > > > static int obj_is_packed(const struct object_id *oid) > > { > > - return !!packlist_find(&to_pack, oid, NULL); > > + return packlist_find(&to_pack, oid, NULL) || > > + (reuse_packfile_bitmap && > > + bitmap_walk_contains(bitmap_git, reuse_packfile_bitmap, oid)); > > Same question here - why do we need to check the reuse information? Maybe a reuse bitmap makes it cheap enough to check that information? Thank you for the review, Christian.