On Tue, 17 Dec 2024 at 14:52, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > Where the buf value holds the binary storage from vbin_printf() and written > in trace_vbprintk(). Yeah, it looks like it does depend on the arguments > being word aligned. So the thing is, they actually aren't word-aligned at all. Yes, the buffer is ostensibly individual words, and the buffer size is given in words, and 64-bit data is saved/fetched as two separate words. But if you look more closely, you'll see that the way the buffer is managed is actually not as a word array at all, but using char *str, *end; instead of word pointers. And the reason is because it does str = PTR_ALIGN(str, sizeof(type)); ... str += sizeof(type); so byte-sized data is *not* given a word, it's only given a single char, and then if it's followed by an "int", the pointer will be aligned at that point. It does mean that the binary buffers are a bit denser, but since %c and %hhd etc are VERY unusual (and often will be mixed with int-sized data), that denser format isn't commonly an actual advantage. And the reason I noticed this? When I was trying to clean up and simplify the vsnprintf() code to not have so many different cases, the *regular* number handling for char/short/int-sized cases ends up being just one case that looks like this: num = get_num(va_arg(args, int), fmt.state, spec); because char/short/int are all just "va_arg(args, int)" values. Then the actual printout depends on that printf_spec thing (and, in my experimental branch, that "fmt.state", but that's a local trial thing where I split the printf_spec differently for better code generation). So basically the core vfsprintf case doesn't need to care about fetching char/short/int, because va_args always just formats those kinds of arguments as int, as per the usual C implicit type expansion rules. But the binary printf thing then has to have three different cases, because unlike the normal calling convention, it actually packs char/short/int differently. So with all those nice cleanups I tried still look like this: case FORMAT_STATE_2BYTE: num = get_num(get_arg(short), fmt.state, spec); break; case FORMAT_STATE_1BYTE: num = get_num(get_arg(char), fmt.state, spec); break; default: num = get_num(get_arg(int), fmt.state, spec); break; which is admittedly still a lot better than the current situation in top-of-tree, which has separate versions for a *lot* more types. So right now top-of-tree has 11 different enumerated cases for number printing: FORMAT_TYPE_LONG_LONG, FORMAT_TYPE_ULONG, FORMAT_TYPE_LONG, FORMAT_TYPE_UBYTE, FORMAT_TYPE_BYTE, FORMAT_TYPE_USHORT, FORMAT_TYPE_SHORT, FORMAT_TYPE_UINT, FORMAT_TYPE_INT, FORMAT_TYPE_SIZE_T, FORMAT_TYPE_PTRDIFF and in my test cleanup, I've cut this down to just four cases: FORMAT_STATE_xBYTE (x = 1, 2, 4, 8). And the actual argument fetch for the *normal* case is actually just two cases (because the 8-byte and the "4 bytes or less" cases are different for va_list, but 1/2/4 bytes are just that single case). But the binary printf argument save/fetch is not able to be cut down to just two cases because of how it does that odd "ostensibly a word array, but packed byte/short fields" thing. Oh well. It's not a big deal. But I was doing this to see if I could regularize the vsnprintf() logic and make sharing better - and then just the binary version already causes unnecessary duplication. If the *only* thing that accesses that word array is vbin_printf and bstr_printf, then I could just change the packing to act like va_list does (ie the word array would *actually* be a word array, and char and short values would get individual words). But at least the bpf cde seems to know about the crazy packing, and also does that tmp_buf = PTR_ALIGN(tmp_buf, sizeof(u32)); in bpf_bprintf_prepare() because it knows that it's not *actually* an array of words despite it being documented as such. Of course, the bpf code only does the packed access thing for '%c', and doesn't seem to know that the same is true of '%hd' and '%hhd', presumably because nobody actually uses that. Let's add Alexei to the participants. I think bpf too would actually prefer that the odd char/short packing *not* be done, if only because it clearly does the wrong thing as-is for non-%c argument (ie those %hd/%hhd cases). Who else accesses that odd "binary printed pseudo-word array"? Linus