Re: [PATCH 02/13] oidset2: create oidset subclass with object length and pathname

Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> · Wed, 27 Sep 2017 10:47:13 -0400

On 9/26/2017 6:20 PM, Jonathan Tan wrote:
On Fri, 22 Sep 2017 20:26:21 +0000
Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote:

From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>

Create subclass of oidset where each entry has a
field to store the length of the object's content
and an optional pathname.

This will be used in a future commit to build a
manifest of omitted objects in a partial/narrow
clone/fetch.

As Brandon mentioned, I think "oidmap" should be the new data structure
of choice (with "oidset" modified to use it).

I'll take a look at that. I'm not exactly happy with
my oidset2, but it works and it minimized touching other
things.  But yes, it may clear up a few things.

+struct oidset2_entry {
+	struct hashmap_entry hash;
+	struct object_id oid;
+
+	enum object_type type;
+	int64_t object_length;	/* This is SIGNED. Use -1 when unknown. */
+	char *pathname;
+};

object_length is defined to be "unsigned long" in Git code, I think.
When is object_length not known, and in those cases, would it be better
to use a separate data structure to store what we need?

Yeah, I struggled with that one.  Git currently treats file size as
a 32-bit unsigned value throughout the code.  I assume eventually there
will be a round of changes to support 64-bit values, so this anticipates
that.

I could change it to be an unknown flag, rather assuming -1, but in an
earlier draft I was printing -1 in the rev-list output.  I can change this.

WRT a separate structure, the SET I create will contain entries for items
where we may or may not know the size and that depends on the context.
When building a list of already-missing blobs (with the --filter-print-missing)
we never know the size.  But when building a list of to-be-omitted blobs
(from the current set of filter options), we may or may not know.  I'm
not sure we need 2 _entry definitions right now.