On 9/26/2017 6:20 PM, Jonathan Tan wrote:
On Fri, 22 Sep 2017 20:26:21 +0000
Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote:
From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
Create subclass of oidset where each entry has a
field to store the length of the object's content
and an optional pathname.
This will be used in a future commit to build a
manifest of omitted objects in a partial/narrow
clone/fetch.
As Brandon mentioned, I think "oidmap" should be the new data structure
of choice (with "oidset" modified to use it).
I'll take a look at that. I'm not exactly happy with
my oidset2, but it works and it minimized touching other
things. But yes, it may clear up a few things.
+struct oidset2_entry {
+ struct hashmap_entry hash;
+ struct object_id oid;
+
+ enum object_type type;
+ int64_t object_length; /* This is SIGNED. Use -1 when unknown. */
+ char *pathname;
+};
object_length is defined to be "unsigned long" in Git code, I think.
When is object_length not known, and in those cases, would it be better
to use a separate data structure to store what we need?
Yeah, I struggled with that one. Git currently treats file size as
a 32-bit unsigned value throughout the code. I assume eventually there
will be a round of changes to support 64-bit values, so this anticipates
that.
I could change it to be an unknown flag, rather assuming -1, but in an
earlier draft I was printing -1 in the rev-list output. I can change this.
WRT a separate structure, the SET I create will contain entries for items
where we may or may not know the size and that depends on the context.
When building a list of already-missing blobs (with the --filter-print-missing)
we never know the size. But when building a list of to-be-omitted blobs
(from the current set of filter options), we may or may not know. I'm
not sure we need 2 _entry definitions right now.