On 27/02/2024 19.32, Ville Syrjälä wrote: > On Tue, Feb 27, 2024 at 10:38:10AM +0100, Rasmus Villemoes wrote: >> On 26/02/2024 15.57, Jani Nikula wrote: >> >>> Personally I suck at remembering even the standard printf conversion >>> specifiers, let alone all the kernel extensions. I basically have to >>> look them up every time. I'd really love some %{name} format for named >>> pointer things. And indeed preferrably without the %p. Just %{name}. >> >> Sorry to spoil the fun, but that's a non-starter. >> >> foo.c: In function ‘foo’: >> foo.c:5:24: warning: unknown conversion type character ‘{’ in format >> [-Wformat=] >> 5 | printf("Hello %{function} World\n", &foo); >> | ^ >> >> You can't start accepting stuff that -Wformat will warn about. We're not >> going to start building with Wno-format. > > Are there any sensible looking characters we could use for > this? Ideally I'd like to have something to bracket the > outsides, and perhaps a namespace separator in the middle. > > Or are we really forced into having essentially a random set > of characters following just a %p/etc.? You can't put stuff after % that is not in the C standard (or POSIX) - not until you teach all supported compilers a way to register your own printf specifier and the semantics of the expected varargs. And the only format specifier that will accept a random pointer is %p. Now, as for what we put after %p, the reason we've ended up with the "random collection of letters" is (probably, I wasn't around when this was introduced) that you can very reasonably have a format string with %p followed by some punctuation where you mean for that punctuation to be output as-is (as a normal printf() implementation would), whereas it would be weird to write %pR" and expect some output like 0x1234fedcR . Hence the heuristic was that one could allow any alphanumerics to modify how that %p should be handled, and in the format string parser simply skip over those alphanumerics - all without making the compiler angry. So the problem with introducing %p{some-thing} is that somebody could already have that %p (possibly with some existing alphanumeric extension(s)) followed by an opening curly brace, with the latter expected to be a literal thing. Same for any other punctuation character. You could probably mostly grep and see if any exist, but there might be format strings broken across two lines using implicit string concatenation that won't be found, as well as even more creative things. That leaves something like %pX{}, i.e. where some new letter is designated to indicate "hey, I want something much more readable and please interpret what's inside {}". That's doable, and then you could put mostly anything (except } and %) inside there. The format parsing would just need to be taught that X is special and skip to the }, not just alphanumerics. >>> And then we could discuss adding support for drm specific things. I >>> guess one downside is that the functions to do this would have to be in >>> vsprintf.c instead of drm. Unless we add some code in drm for this >>> that's always built-in. >> >> If people can be trusted to write callbacks with the proper semantics >> for snprintf [1], we could do a generic > > Yeah, I was at some point thinking that having a version of > register_printf_function() for printk() might be nice. The dangers > being that we get conflicts between subsystems (*), or that it gets > totally out of hand, or as you point out below people will start > to do questionable things in there. > > (*) My earlier "include a subsystem namespace in the format" > idea was basically how I was thinking of avoiding conflicts. So if we really want to go down this road, I think it should be something like %pX{drm:whatever}, with core printf just looking up the token "drm" in a run-time list of registered callbacks (because I don't want vsprintf.c filled with random subsystems' formatting code), and that single callback would then be handed a pointer to the rest of the format string (i.e. the "whatever}..."), the pointer argument from the varargs, and the buf,end pair. But then we're back to trusting that callback (which might of course multiplex based on the "whatever" part) to behave correctly. And then we might as well avoid the string parsing and just do the "callback + pointer" in one struct (passed as pointer to compound literal), which also avoids the tricky "what to do about module unload versus unregistering of callbacks" etc. Rasmus