Re: [PATCH v4 17/17] chunk-format: add technical docs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> +Chunk-based file formats
> +========================
> +
> +Some file formats in Git use a common concept of "chunks" to describe
> +sections of the file. This allows structured access to a large file by
> +scanning a small "table of contents" for the remaining data. This common
> +format is used by the `commit-graph` and `multi-pack-index` files. See
> +link:technical/pack-format.html[the `multi-pack-index` format] and
> +link:technical/commit-graph-format.html[the `commit-graph` format] for
> +how they use the chunks to describe structured data.

I've read the doc added here to the end; well written and easy to
understand.

I wonder how/if well reftable files fit in the scheme, or if it
doesn't, should the chunk file format API be updated to accomodate
it (or the other way around)?

> +Extract the data information for each chunk using `pair_chunk()` or
> +`read_chunk()`:
> +
> +* `pair_chunk()` assigns a given pointer with the location inside the
> +  memory-mapped file corresponding to that chunk's offset. If the chunk
> +  does not exist, then the pointer is not modified.

I think it is worth adding:

    The caller is expected to know where the returned chunk ends by
    some out-of-band means, as this function only gives the offset
    but not the size, unlike the read_chunk() function.

> +* `read_chunk()` takes a `chunk_read_fn` function pointer and calls it
> +  with the appropriate initial pointer and size information. The function
> +  is not called if the chunk does not exist. Use this method to read chunks
> +  if you need to perform immediate parsing or if you need to execute logic
> +  based on the size of the chunk.
> +
> +After calling these methods, call `free_chunkfile()` to clear the
> +`struct chunkfile` data. This will not close the memory-mapped region.
> +Callers are expected to own that data for the timeframe the pointers into
> +the region are needed.

Thanks.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux