"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > +/* > + * When writing a chunk-based file format, collect the chunks in > + * an array of chunk_info structs. The size stores the _expected_ > + * amount of data that will be written by write_fn. > + */ > +struct chunk_info { > + uint32_t id; > + uint64_t size; > + chunk_write_fn write_fn; > +}; > +... > +void add_chunk(struct chunkfile *cf, > + uint64_t id, > + chunk_write_fn fn, > + size_t size) > +{ > + ALLOC_GROW(cf->chunks, cf->chunks_nr + 1, cf->chunks_alloc); > + > + cf->chunks[cf->chunks_nr].id = id; > + cf->chunks[cf->chunks_nr].write_fn = fn; > + cf->chunks[cf->chunks_nr].size = size; > + cf->chunks_nr++; > +} Somebody somewhere between the caller in the higher part of the callchain (that has to work with platform native types) and the on-disk format at the bottom of the callchain (that has to work with fixed size data fields) needs to make sure that the size that the higher level caller has fits on-disk data structure we define, and the data we read from disk fits the in-core structure our caller use on the reading side. If there is a function at the one level closer to the disk than "struct chunk_info" that takes a "struct chunk_info" and writes the id and size to disk (and fills "struct chunk_info" from what is read from the disk, on the reading side), it would be a good place to do the size_t to uint64_t conversion. It is OK to do the conversion-while-checking in add_chunk(), too. But a silent type casting from size_t to uint64_t done silently by assignment bothers me. Also, I think you meant to make the incoming ID uint32_t; am I missing something, or did nobody notice it in the review of the previous round? > +int write_chunkfile(struct chunkfile *cf, void *data) > +{ > + int i; > + size_t cur_offset = cf->f->offset + cf->f->total; That ought to be off_t, as it is a seek position inside a file (struct hashfile.total is already off_t). Use csum-file.h::hashfile_total() instead, perhaps? .offset member is an implementation detail of the hashfile API (i.e. some leftover bits are kept in in-core buffer, until we accumulate enough to make it worth flushing to the disk), and by using the helper, this code does not have to know about it. > + /* Add the table of contents to the current offset */ > + cur_offset += (cf->chunks_nr + 1) * CHUNK_LOOKUP_WIDTH; Is that 12 == sizeof(chunk_info.id) + sizeof(chunk_info.size)? If so, this makes sense. > + for (i = 0; i < cf->chunks_nr; i++) { > + hashwrite_be32(cf->f, cf->chunks[i].id); > + hashwrite_be64(cf->f, cur_offset); > + > + cur_offset += cf->chunks[i].size; > + } > + > + /* Trailing entry marks the end of the chunks */ > + hashwrite_be32(cf->f, 0); > + hashwrite_be64(cf->f, cur_offset); OK. This helper does not tell us anything about what comes in the on-disk file before this point, but we write a table of contents that says "chunk with this ID has this size, chunk with that ID has that size, ...", concluded by something that looks like another entry with chunk ID 0 that records the current offset as its size. > + for (i = 0; i < cf->chunks_nr; i++) { > + uint64_t start_offset = cf->f->total + cf->f->offset; Likewise about the type and use of hashfile_total(). > + int result = cf->chunks[i].write_fn(cf->f, data); > + > + if (result) > + return result; > + > + if (cf->f->total + cf->f->offset - start_offset != cf->chunks[i].size) > + BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead", > + cf->chunks[i].size, cf->chunks[i].id, > + cf->f->total + cf->f->offset - start_offset); I won't complain, as this apparently is sufficient to abstract out the two existing chunked format files, but it somehow feels a bit limiting that the one that calls add_chunk() is required to know what the size of generated data would be, way before .write_fn() is called to produce the actual data here. > + } > + > + return 0; > +} > diff --git a/chunk-format.h b/chunk-format.h > new file mode 100644 > index 00000000000..bfaed672813 > --- /dev/null > +++ b/chunk-format.h > @@ -0,0 +1,20 @@ > +#ifndef CHUNK_FORMAT_H > +#define CHUNK_FORMAT_H > + > +#include "git-compat-util.h" > + > +struct hashfile; > +struct chunkfile; > + > +struct chunkfile *init_chunkfile(struct hashfile *f); > +void free_chunkfile(struct chunkfile *cf); > +int get_num_chunks(struct chunkfile *cf); > +typedef int (*chunk_write_fn)(struct hashfile *f, > + void *data); Write this on a single line. > +void add_chunk(struct chunkfile *cf, > + uint64_t id, > + chunk_write_fn fn, > + size_t size); Shouldn't this match the order of members in chunk_info struct? > +int write_chunkfile(struct chunkfile *cf, void *data); > + > +#endif