On Tue, Aug 09, 2016 at 10:31:43PM +0300, Kirill Smelkov wrote: > Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there > are two codepaths in pack-objects: with & without using bitmap > reachability index. Sorry, I got distracted from reviewing these patches. I'll give them a detailed look now and hopefully we can finalize the topic. > In want_object_in_pack() we care to start the checks from already found > pack, if we have one, this way determining the answer right away > in case neither --local nor --honour-pack-keep are active. In > particular, as p5310-pack-bitmaps.sh shows, we do not do harm to > served-with-bitmap clones performance-wise: > > Test 56dfeb62 this tree > ----------------------------------------------------------------- > 5310.2: repack to disk 9.63(8.67+0.33) 9.47(8.55+0.28) -1.7% > 5310.3: simulated clone 2.07(2.17+0.12) 2.03(2.14+0.12) -1.9% > 5310.4: simulated fetch 0.78(1.03+0.02) 0.76(1.00+0.03) -2.6% > 5310.6: partial bitmap 1.97(2.43+0.15) 1.92(2.36+0.14) -2.5% > > with all differences strangely showing we are a bit faster now, but > probably all being within noise. Good to know there is no regression. It is curious that there is a slight _improvement_ across the board. Do we have an explanation for that? It seems odd that noise would be so consistent. > And in the general case we care not to have duplicate > find_pack_entry_one(*found_pack) calls. Worst what can happen is we can > call want_found_object(*found_pack) -- newly introduced helper for > checking whether we want object -- twice, but since want_found_object() > is very lightweight it does not make any difference. I had trouble parsing this. I think maybe: In the general case we do not want to call find_pack_entry_one() more than once, because it is expensive. This patch splits the loop in want_object_in_pack() into two parts: finding the object and seeing if it impacts our choice to include it in the pack. We may call the inexpensive want_found_object() twice, but we will never call find_pack_entry_one() if we do not need to. > +static int want_found_object(int exclude, struct packed_git *p) > +{ > + if (exclude) > + return 1; > + if (incremental) > + return 0; > + > + /* > + * When asked to do --local (do not include an object that appears in a > + * pack we borrow from elsewhere) or --honor-pack-keep (do not include > + * an object that appears in a pack marked with .keep), finding a pack > + * that matches the criteria is sufficient for us to decide to omit it. > + * However, even if this pack does not satisfy the criteria, we need to > + * make sure no copy of this object appears in _any_ pack that makes us > + * to omit the object, so we need to check all the packs. Signal that by > + * returning -1 to the caller. > + */ > + if (!ignore_packed_keep && > + (!local || !have_non_local_packs)) > + return 1; Hmm. The comment says "-1", but the return says "1". That is because the comment is describing the return that happens at the end. :) I wonder if the last sentence should be: We can check here whether these options can possibly matter; if not, we can return early from the function here. Otherwise, we signal "-1" at the end to tell the caller that we do not know either way, and it needs to check more packs. > - *found_pack = NULL; > - *found_offset = 0; > + /* > + * If we already know the pack object lives in, start checks from that > + * pack - in the usual case when neither --local was given nor .keep files > + * are present we will determine the answer right now. > + */ > + if (*found_pack) { > + want = want_found_object(exclude, *found_pack); > + if (want != -1) > + return want; > + } Looks correct. Though it is not really "start checks from..." anymore, but rather "do a quick check to see if we can quit early, and otherwise start the loop". That might be nitpicking, though. > for (p = packed_git; p; p = p->next) { > - off_t offset = find_pack_entry_one(sha1, p); > + off_t offset; > + > + if (p == *found_pack) > + offset = *found_offset; > + else > + offset = find_pack_entry_one(sha1, p); > + This hunk will conflict with the MRU optimizations in 'next', but I think the resolution should be pretty trivial. > static int add_object_entry(const unsigned char *sha1, enum object_type type, > const char *name, int exclude) > { > - struct packed_git *found_pack; > - off_t found_offset; > + struct packed_git *found_pack = NULL; > + off_t found_offset = 0; I think technically we don't need to initialize found_offset here (it is considered only if *found_pack is not NULL), but it doesn't hurt to make our starting assumptions clear. > @@ -1073,6 +1097,9 @@ static int add_object_entry_from_bitmap(const unsigned char *sha1, > if (have_duplicate_entry(sha1, 0, &index_pos)) > return 0; > > + if (!want_object_in_pack(sha1, 0, &pack, &offset)) > + return 0; > + And this caller doesn't need to worry about initialization, because of course it knows it has a pack/offset already. Good. > diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh > index 3893afd..a278d30 100755 > --- a/t/t5310-pack-bitmaps.sh > +++ b/t/t5310-pack-bitmaps.sh Tests look OK. I saw a few style nitpicks, but I think they are not even against our style guide but more "I would have written it like this" and are not even worth quibbling over. So I think the code here is fine, and I just had a few minor complaints on comment and commit message clarity. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html