On Sun, Jul 04, 2021 at 11:42:03PM +0100, Matthew Wilcox wrote: > On Sun, Jul 04, 2021 at 11:20:07PM +0100, Gary Guo wrote: > > This is big endian. > > Fundamentally, it doesn't matter whether it's encoded as top-7 + > bottom-8 or bottom-7 + top-8. It could just as well be: > > if (len >= 128) { > len -= 128; > len += *data * 256; > data++; > } > > It doesn't matter whether it's compatible with some other encoding. > This encoding has one producer and one consumer. As long as they agree, > it's fine. If you want to make an argument about extensibiity, then > I'm going to suggest that wanting a symbol name more than 32kB in size > is a sign you've done something else very, very wrong. > > At that point, you should probably switch to comparing hashes of the > symbol instead of the symbol. Indeed, I think we're already there at > 300 byte symbols; we should probably SipHash the full, unmangled symbol > [1]. At 33k symbols in the current kernel, the risk of a collision of > a 64-bit value is negligible, and almost every kernel symbol is longer > than 7 bytes (thankfully). We really should have a better standard varint encoding - open coding varint encodings in 2021 is offensive, and LEB128 is retarded due to using the high bit of _every_ byte. Here's the encoding I did for bcachefs, which I nominate for a standard varint encoding, unless someone knows of a way to do better: https://evilpiepirate.org/git/bcachefs.git/tree/fs/bcachefs/varint.c