On Sun, 4 Jul 2021 22:04:49 +0100 Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > On Sun, Jul 04, 2021 at 10:27:40PM +0200, ojeda@xxxxxxxxxx wrote: > > From: Miguel Ojeda <ojeda@xxxxxxxxxx> > > > > Rust symbols can become quite long due to namespacing introduced > > by modules, types, traits, generics, etc. > > > > Increasing to 255 is not enough in some cases, and therefore > > we need to introduce 2-byte lengths to the symbol table. We call > > these "big" symbols. > > > > In order to avoid increasing all lengths to 2 bytes (since most > > of them only require 1 byte, including many Rust ones), we use > > length zero to mark "big" symbols in the table. > > What happened to my suggestion from last time of encoding symbols < > 128 as 0-127 and symbols larger than that as (data[0] - 128) * 256 + > data[1]) ? Yeah, I agree ULEB128 or similar encoding scheme would be better than using 0 as an escape byte. If ULEB128 is used and we restrict number of bytes to 2, it will encode numbers up to 2**14 instead of 2**16 like the current scheme, but that should be sufficient anyway. - Gary