Re: [PATCH] promisor-remote.c: use oidset for deduplication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:

> From what I understand of GGG I think you updated only the summary on
> the PR but not the commit itself, the latter is what would go into
> git.git. Here the commit should be updated so we get this message.

Ah, I have long learned to ignore the blurb after the three-dash
line when a patch comes via GGG (because the PR text is often
useless and most often people seem to use text identical to the
commit log message).  I didn't realize that the proposed commit log
message was useless in this particular patch.  I was wondering why
you were talking about an extra blank line after the title there ;-)

> The part after the "---" is usually just used in this project for ad-hoc
> list-only comment.

True.  "What I'm not sure about is ..." however does belong to the
list-only comment, though.  If it is not 

> The difference between these two APIs is thaht oidset is hash-backed,
> and you'd insert into it and we de-duplicate any duplicate OIDs on-the-fly.
> 
> The oid_array is just an realloc()'d "struct object_id *oid". On
> insertion you can insert duplicates, but it has the ability to track
> "I've sorted these", and "let's iterate over this sorted, and de-dup any
> duplicates".
>
> We have the two APIs for a reason, but I don't know in any of these
> cases whether this change is safe.
>
> Does e.g. index-pack.c always receive de-duplicated OIDs and we were
> wasting CPU cycles using an oidset?
>
> Do some of these like pack-objects.c receive de-duplicated OIDs from
> e.g. "git repack" *now*, but we just lack test coverage to see that
> they're happy to get duplicate OIDs on stdin (e.g. manually from a
> user), and this would introduce a bug?

Also, if oid_array is used to produce a de-duplicated list of object
names in the current code, it is very likely that oid_array is
sorted (perhaps the objects are fed in sorted order), and the
callers depend on the order of the objects they find in the array.
Throwing sorted list of object names at oidset and then iterating
over what is in the oidset would likely to destroy the original
ordering.  I do not offhand know if the callers are broken by such a
change (either correctness-wise or performance-wise).

> But most importantly is it worth it? What's the rationale for the
> change? Less CPU/memory use? Getting e.g. "hyperfine" or "/usr/bin/time
> -v" output for those (if so) would be valuable.

Thanks.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux