Am 11.09.21 um 16:26 schrieb Jeff King: > On Sat, Sep 11, 2021 at 10:01:40AM +0200, René Scharfe wrote: > >> Store the object ID of broken pack entries in an oidset instead of >> keeping only their hashes in an unsorted array. The resulting code is >> shorter and easier to read. It also handles the (hopefully) very rare >> case of having a high number of bad objects better. > > Yay, I'm very happy to see this kind of cleanup replacing ad hoc data > structures with well-tested ones. > >> @@ -303,15 +304,9 @@ static int nth_midxed_pack_entry(struct repository *r, >> if (!is_pack_valid(p)) >> return 0; >> >> - if (p->num_bad_objects) { >> - uint32_t i; >> - struct object_id oid; >> - nth_midxed_object_oid(&oid, m, pos); >> - for (i = 0; i < p->num_bad_objects; i++) >> - if (hasheq(oid.hash, >> - p->bad_object_sha1 + the_hash_algo->rawsz * i)) >> - return 0; >> - } >> + nth_midxed_object_oid(&oid, m, pos); >> + if (oidset_contains(&p->bad_objects, &oid)) >> + return 0; > > Calling nth_midxed_object_oid() implies a memcpy() under the hood. In > the old code, we'd skip that in the common case that we had no corrupt > objects, but now we'll pay the cost regardless. memcpy() isn't _that_ > expensive, but I'd expect this to be a relatively hot code path. > > Is it worth sticking all of this inside: > > if (oidset_size(&p->bad_objects)) > > ? Hard to say. It would certainly match the old code more closely. Is a function call cheaper than copying 32 bytes? Depends on the CPU and whether the hash is cached, I guess. And cached it probably is, because the caller did a binary search for it.. We can pass on the original oid to avoid the nth_midxed_object_oid() call, but inlining the whole thing might even be nicer. René