Re: [PATCH/RFC v3 01/12] pack-objects: a bit of document about struct object_entry

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 09 Mar 2018 14:34:14 -0800

Nguyễn Thái Ngọc Duy  <pclouds@xxxxxxxxx> writes:

> The role of this comment block becomes more important after we shuffle
> fields around to shrink this struct. It will be much harder to see what
> field is related to what. This also documents the holes in this struct
> according to pahole.
>
> A couple of notes on shrinking the struct:
>
> 1) The reader may notice one thing from this document and the shrinking
> business. If "delta" is NULL, all other delta-related fields should be
> irrelevant. We could group all these in a separate struct and replace
> them all with a pointer to this struct (allocated separately).
>
> This does not help much though since 85% of objects are deltified
> (source: linux-2.6.git). The gain is only from non-delta objects, which
> is not that significant.

OK.

> 2) The field in_pack_offset and idx.offset could be merged. But we need
> to be very careful. Up until the very last phase (object writing),
> idx.offset is not used and can hold in_pack_offset. Then idx.offset will
> be updated with _destination pack's_ offset, not source's. But since we
> always write delta's bases first, and we only use in_pack_offset in
> writing phase when we reuse objects, we should be ok?

By separating the processing in strict phases, I do think the result
would be OK, but at the same time, that does smell like an
invitation for future bugs.

> +/*
> + * basic object info
> + * -----------------
> + * idx.oid is filled up before delta searching starts. idx.crc32 and
> + * is only valid after the object is written down and will be used for
> + * generating the index. idx.offset will be both gradually set and
> + * used in writing phase (base objects get offset first, then deltas
> + * refer to them)

Here, I'd feel that "written out" somehow would sound more natural
than "written down", but that is perhaps because I've seen it used
elsewhere and I am confusing familiarlity with naturalness.  In any
case, if we mean "written to the resulting packdata stream", saying
that to be more explicit is probably a good idea.  We compute crc32
and learn the offset for each object as we write them to the result.

> + * If a delta is cached in memory and is compressed, "delta" points to
> + * the data and z_delta_size contains the compressed size. If it's

Isn't it "delta_data" (aot "delta") that points at the cached delta
data?

> + * uncompressed [1], z_delta_size must be zero. delta_size is always
> + * the uncompressed size and must be valid even if the delta is not
> + * cached. Delta recreation technically only depends on "delta"
> + * pointer, but delta_size is still used to verify it's the same as
> + * before.
> + *
> + * [1] during try_delta phase we don't bother with compressing because
> + * the delta could be quickly replaced with a better one.
> + */
>  struct object_entry {
>  	struct pack_idx_entry idx;
>  	unsigned long size;	/* uncompressed size */
> @@ -28,6 +74,7 @@ struct object_entry {
>  	unsigned tagged:1; /* near the very tip of refs */
>  	unsigned filled:1; /* assigned write-order */
>  
> +	/* XXX 28 bits hole, try to pack */
>  	/*
>  	 * State flags for depth-first search used for analyzing delta cycles.
>  	 *
> @@ -40,6 +87,7 @@ struct object_entry {
>  		DFS_DONE
>  	} dfs_state;
>  	int depth;
> +	/* size: 136, padding: 4 */
>  };
>  
>  struct packing_data {