"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > +Chunk-based file formats > +======================== > + > +Some file formats in Git use a common concept of "chunks" to describe > +sections of the file. This allows structured access to a large file by > +scanning a small "table of contents" for the remaining data. This common > +format is used by the `commit-graph` and `multi-pack-index` files. See > +link:technical/pack-format.html[the `multi-pack-index` format] and > +link:technical/commit-graph-format.html[the `commit-graph` format] for > +how they use the chunks to describe structured data. I've read the doc added here to the end; well written and easy to understand. I wonder how/if well reftable files fit in the scheme, or if it doesn't, should the chunk file format API be updated to accomodate it (or the other way around)? > +Extract the data information for each chunk using `pair_chunk()` or > +`read_chunk()`: > + > +* `pair_chunk()` assigns a given pointer with the location inside the > + memory-mapped file corresponding to that chunk's offset. If the chunk > + does not exist, then the pointer is not modified. I think it is worth adding: The caller is expected to know where the returned chunk ends by some out-of-band means, as this function only gives the offset but not the size, unlike the read_chunk() function. > +* `read_chunk()` takes a `chunk_read_fn` function pointer and calls it > + with the appropriate initial pointer and size information. The function > + is not called if the chunk does not exist. Use this method to read chunks > + if you need to perform immediate parsing or if you need to execute logic > + based on the size of the chunk. > + > +After calling these methods, call `free_chunkfile()` to clear the > +`struct chunkfile` data. This will not close the memory-mapped region. > +Callers are expected to own that data for the timeframe the pointers into > +the region are needed. Thanks.