So here's my proposed file format for the zchunk file. Should I add some flags to facilitate possible different compression formats? +-+-+-+-+-+-+-+-+-+-+-+-+==================+=================+ | ID | Index size | Compressed Index | Compressed Dict | +-+-+-+-+-+-+-+-+-+-+-+-+==================+=================+ +===========+===========+ | Chunk | Chunk | ==> More chunks +===========+===========+ ID '\0ZCK', identifies file as zchunk file Index size This is a 64-bit unsigned integer containing the size of compressed index. Compressed Index This is the index, which is described in the next section. The index is compressed using standard zstd compression without a custom dictionary. Compressed Dict This is a custom dictionary used when compressing each chunk. Because each chunk is compressed completely separately from the others, the custom dictionary gives us much better overall compression. The custom dictionary is compressed using standard zstd compression without using a separate custom dictionary (for obvious reasons). Chunk This is a chunk of data, compressed using zstd with the custom dictionary provided above. The index: +++++++++++++++++++++++++++++++-+-+-+-+-+-+-+-+ | sha256sum | End of dict | +++++++++++++++++++++++++++++++-+-+-+-+-+-+-+-+ +++++++++++++++++++++++++++++++-+-+-+-+-+-+-+-+ | sha256sum | End of chunk | ==> More +++++++++++++++++++++++++++++++-+-+-+-+-+-+-+-+ sha256sum of compressed dict This is a binary sha256sum of the compressed chunk, used to detect whether two dicts are identical. End of dict This is the location of the end of the dict with 0 being the end of the index. This gives us the information we need to find and decompress the dict. sha256sum of compressed chunk This is a binary sha256sum of the compressed chunk, used to detect whether any two chunks are identical. End of chunk This is the location of the end of the chunk with 0 being the end of the index. This gives us the information we need to find and decompress each chunk. The index is designed to be able to be extracted from the file on the server and downloaded separately, to facilitate downloading only the parts of the file that are needed, but must then be re-embedded when assembling the file so the user only needs to keep one file. _______________________________________________ infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx