On 13/08/18 04:33, Christian Couder wrote: > On Mon, Aug 13, 2018 at 3:14 AM, Ramsay Jones [snip] >> And neither the sha1 or str hash-maps are destroyed here. >> (That is not necessarily a problem, of course! ;-) ) > > The instances are declared as static: > > static khash_sha1 *island_marks; > > static kh_str_t *remote_islands; > > so it maybe ok. Yes, that's fine. > >>> +struct island_bitmap { >>> + uint32_t refcount; >>> + uint32_t bits[]; >> >> Use FLEX_ARRAY here? We are slowly moving toward requiring >> certain C99 features, but I can't remember a flex array >> weather-balloon patch. > > This was already discussed by Junio and Peff there: > > https://public-inbox.org/git/20180727130229.GB18599@xxxxxxxxxxxxxxxxxxxxx/ > That is a fine discussion, without a firm conclusion, but I don't think you can simply do nothing here: $ cat -n junk.c 1 #include <stdint.h> 2 3 struct island_bitmap { 4 uint32_t refcount; 5 uint32_t bits[]; 6 }; 7 $ gcc --std=c89 --pedantic -c junk.c junk.c:5:11: warning: ISO C90 does not support flexible array members [-Wpedantic] uint32_t bits[]; ^~~~ $ gcc --std=c99 --pedantic -c junk.c $ >>> +}; > >>> +int in_same_island(const struct object_id *trg_oid, const struct object_id *src_oid) >> >> Hmm, what does the trg_ prefix stand for? >> >>> +{ >>> + khiter_t trg_pos, src_pos; >>> + >>> + /* If we aren't using islands, assume everything goes together. */ >>> + if (!island_marks) >>> + return 1; >>> + >>> + /* >>> + * If we don't have a bitmap for the target, we can delta it >> >> ... Ah, OK, trg_ => target. > > I am ok to replace "trg" with "target" (or maybe "dst"? or something > else) and "src" with "source" if you think it would make things > clearer. If it had been dst_ (or target), I would not have had a 'huh?' moment, but it is not all that important. > >>> +static void add_ref_to_island(const char *island_name, const struct object_id *oid) >>> +{ >>> + uint64_t sha_core; >>> + struct remote_island *rl = NULL; >>> + >>> + int hash_ret; >>> + khiter_t pos = kh_put_str(remote_islands, island_name, &hash_ret); >>> + >>> + if (hash_ret) { >>> + kh_key(remote_islands, pos) = xstrdup(island_name); >>> + kh_value(remote_islands, pos) = xcalloc(1, sizeof(struct remote_island)); >>> + } >>> + >>> + rl = kh_value(remote_islands, pos); >>> + oid_array_append(&rl->oids, oid); >>> + >>> + memcpy(&sha_core, oid->hash, sizeof(uint64_t)); >>> + rl->hash += sha_core; >> >> Hmm, so the first 64-bits of the oid of each ref that is part of >> this island is added together as a 'hash' for the island. And this >> is used to de-duplicate the islands? Any false positives? (does it >> matter - it would only affect performance, not correctness, right?) > > I would think that a false positive from pure chance is very unlikely. > We would need to approach billions of delta islands (as 2 to the power > 64/2 is in the order of billions) for the probability to be > significant. GitHub has less than 50 millions users and it is very > unlikely that a significant proportion of these users will fork the > same repo. > > Now if there is a false positive because two forks have exactly the > same refs, then it is not a problem if they are considered the same, > because they are actually the same. Yep, good point. ATB, Ramsay Jones