In addition to four basic types (commit, tree, blob and tag), the pack stream can encode a few other "representation" types, such as REF_DELTA and OFS_DELTA. As we allocate 3 bits in the first byte for this purpose, we do not have much room to add new representation types in place, but we do have one value reserved for future expansion. This patch is about defining how that reserved value is used. The first byte in the pack stream data consists of the following for the current representation types: - Bit 0-3 are used for the low 4-bit of "some" size (not necessarily the size of the representation); - Bit 4-6 are used for object types 0-7, but we have not used type 5 so far and reserved it for future expansion (we could also use type 0 recorded in the pack stream for future expansion, just like how I convert 5 into the real "extended" representation type in this patch); - Bit 7 is used to signal if the second byte needs to be read for sizes that do not fit in the 4-bit. When bit 4-6 encodes type 5, the first byte is used this way: - Bit 0-3 denotes the real "extended" representation type. Because types 0-7 can already be encoded without using the extended format, we can offset the type by 8 (i.e. if bit 0-3 says 3, it means representation type 11 = 3 + 8); - Bit 4-6 has the value "5"; - Bit 7 is used to signal if the _third_ byte needs to be read for larger size that cannot be represented with 8-bit. As it is unlikely for us to pack things that do not need to record any size, the second byte is always used in full to encode the low 8-bit of the size. I haven't started using type=8 and upwards for anything yet, but because we have only one "future expansion" value left, I want us to be extremely careful in order to avoid painting us into a corner that we cannot get out of, so I am sending this out early for a preliminary review. Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> --- cache.h | 3 ++- sha1_file.c | 36 ++++++++++++++++++++++++++++++++---- 2 files changed, 34 insertions(+), 5 deletions(-) diff --git a/cache.h b/cache.h index 2e6ad36..b02139b 100644 --- a/cache.h +++ b/cache.h @@ -380,9 +380,10 @@ enum object_type { OBJ_TREE = 2, OBJ_BLOB = 3, OBJ_TAG = 4, - /* 5 for future expansion */ + OBJ_EXT = 5, /* 5 for future expansion */ OBJ_OFS_DELTA = 6, OBJ_REF_DELTA = 7, + OBJ_CAT_TREE = 8, OBJ_ANY, OBJ_MAX }; diff --git a/sha1_file.c b/sha1_file.c index 27f3b9b..4dcd023 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1254,16 +1254,43 @@ static int experimental_loose_object(unsigned char *map) } unsigned long unpack_object_header_buffer(const unsigned char *buf, - unsigned long len, enum object_type *type, unsigned long *sizep) + unsigned long len, enum object_type *typep, unsigned long *sizep) { unsigned shift; unsigned long size, c; unsigned long used = 0; + enum object_type type; + /* + * MSB of the first byte is used to tell if the second byte + * needs to be read for the size, so type field is only 3-bit + * wide. + */ c = buf[used++]; - *type = (c >> 4) & 7; - size = c & 15; - shift = 4; + type = (c >> 4) & 7; + + if (type != OBJ_EXT) { + /* + * For basic types of object representations, the low + * 4-bit of the first byte is used for the lowermost + * 4-bit of the size. The MSB of the first byte tells + * if the second byte needs to be read for size. + */ + size = c & 15; + shift = 4; + } else { + /* + * For extended types, the low 4-bit of the first byte + * is used for the representation type (offset by 8), + * and the size begins at the second byte. The MSB of + * the first byte is still used to indicate the next + * byte (i.e. the third byte) needs to be read for the + * size. + */ + type = (c & 15) + 8; + size = buf[used++]; + shift = 8; + } while (c & 0x80) { if (len <= used || bitsizeof(long) <= shift) { error("bad object header"); @@ -1274,6 +1301,7 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf, shift += 7; } *sizep = size; + *typep = type; return used; } -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html