Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> writes: > The role of this comment block becomes more important after we shuffle > fields around to shrink this struct. It will be much harder to see what > field is related to what. This also documents the holes in this struct > according to pahole. > > A couple of notes on shrinking the struct: > > 1) The reader may notice one thing from this document and the shrinking > business. If "delta" is NULL, all other delta-related fields should be > irrelevant. We could group all these in a separate struct and replace > them all with a pointer to this struct (allocated separately). > > This does not help much though since 85% of objects are deltified > (source: linux-2.6.git). The gain is only from non-delta objects, which > is not that significant. OK. > 2) The field in_pack_offset and idx.offset could be merged. But we need > to be very careful. Up until the very last phase (object writing), > idx.offset is not used and can hold in_pack_offset. Then idx.offset will > be updated with _destination pack's_ offset, not source's. But since we > always write delta's bases first, and we only use in_pack_offset in > writing phase when we reuse objects, we should be ok? By separating the processing in strict phases, I do think the result would be OK, but at the same time, that does smell like an invitation for future bugs. > +/* > + * basic object info > + * ----------------- > + * idx.oid is filled up before delta searching starts. idx.crc32 and > + * is only valid after the object is written down and will be used for > + * generating the index. idx.offset will be both gradually set and > + * used in writing phase (base objects get offset first, then deltas > + * refer to them) Here, I'd feel that "written out" somehow would sound more natural than "written down", but that is perhaps because I've seen it used elsewhere and I am confusing familiarlity with naturalness. In any case, if we mean "written to the resulting packdata stream", saying that to be more explicit is probably a good idea. We compute crc32 and learn the offset for each object as we write them to the result. > + * If a delta is cached in memory and is compressed, "delta" points to > + * the data and z_delta_size contains the compressed size. If it's Isn't it "delta_data" (aot "delta") that points at the cached delta data? > + * uncompressed [1], z_delta_size must be zero. delta_size is always > + * the uncompressed size and must be valid even if the delta is not > + * cached. Delta recreation technically only depends on "delta" > + * pointer, but delta_size is still used to verify it's the same as > + * before. > + * > + * [1] during try_delta phase we don't bother with compressing because > + * the delta could be quickly replaced with a better one. > + */ > struct object_entry { > struct pack_idx_entry idx; > unsigned long size; /* uncompressed size */ > @@ -28,6 +74,7 @@ struct object_entry { > unsigned tagged:1; /* near the very tip of refs */ > unsigned filled:1; /* assigned write-order */ > > + /* XXX 28 bits hole, try to pack */ > /* > * State flags for depth-first search used for analyzing delta cycles. > * > @@ -40,6 +87,7 @@ struct object_entry { > DFS_DONE > } dfs_state; > int depth; > + /* size: 136, padding: 4 */ > }; > > struct packing_data {