Vicent Marti <tanoku@xxxxxxxxx> writes: > This is the technical documentation and design rationale for the new > Bitmap v2 on-disk format. Hrmpf, that's what I get for reading the series in order... > + The folowing flags are supported: ^^ typos marked by ^ > + By storing all the hashes in a cache together with the bitmapsin ^^ > + The obvious consequence is that the XOR of all 4 bitmaps will result > + in a full set (all bits sets), and the AND of all 4 bitmaps will ^ > + - 1-byte XOR-offset > + The xor offset used to compress this bitmap. For an entry > + in position `x`, a XOR offset of `y` means that the actual > + bitmap representing for this commit is composed by XORing the > + bitmap for this entry with the bitmap in entry `x-y` (i.e. > + the bitmap `y` entries before this one). > + > + Note that this compression can be recursive. In order to > + XOR this entry with a previous one, the previous entry needs > + to be decompressed first, and so on. > + > + The hard-limit for this offset is 160 (an entry can only be > + xor'ed against one of the 160 entries preceding it). This > + number is always positivea, and hence entries are always xor'ed ^ > + with **previous** bitmaps, not bitmaps that will come afterwards > + in the index. Clever. Why 160 though? > + - 2 bytes of RESERVED data (used right now for better packing). What do they mean? > + With an index at the end of the file, we can load only this index in memory, > + allowing for very efficient access to all the available bitmaps lazily (we > + have their offsets in the mmaped file). Is there anything preventing you from mmap()ing the index also? > +== Appendix A: Serialization format for an EWAH bitmap > + > +Ewah bitmaps are serialized in the protocol as the JAVAEWAH > +library, making them backwards compatible with the JGit > +implementation: > + > + - 4-byte number of bits of the resulting UNCOMPRESSED bitmap > + > + - 4-byte number of words of the COMPRESSED bitmap, when stored > + > + - N x 8-byte words, as specified by the previous field > + > + This is the actual content of the compressed bitmap. > + > + - 4-byte position of the current RLW for the compressed > + bitmap > + > +Note that the byte order for this serialization is not defined by > +default. The byte order for all the content in a serialized EWAH > +bitmap can be known by the byte order flags in the header of the > +bitmap index file. Please document the RLW format here. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html