On Wed, Dec 28 2022, Alejandro Colomar wrote: > [[PGP Signed Part:Undecided]] > Hi, > > On 12/23/22 19:12, Linus Torvalds wrote: >> On Fri, Dec 23, 2022 at 10:00 AM Ævar Arnfjörð Bjarmason >> <avarab@xxxxxxxxx> wrote: >>> >>> Whereas the fix here is a fix for a promise we're currently making >>> which hasn't been true since v4.18. >> Hah. Ack. Did anybody ever actually notice? >> I wonder if the newer limit of 64 characters for kworkers shouldn't >> even be mentioned at all, and if the 16-byte truncation for user space >> should also be just removed. >> Those limits should never have been some documented API, they were >> always just implementation details, after all. >> Linus > Sorry about the late reply, holidays. > I agree. A variable implementation detail like this doesn't provide > anything valuable to users; especially since there's no statbility > promise at all. I'd rewrite to just remove the (16) implementation > detail. > > Ævar, would you send an v2 that removes implementation details, rather > than fixing the details? Maybe, because I'm not sure I'm qualified to document this anymore. My current patch just extends the description to cover the 4.18 divergence. Let's separate a few things here: A. The long-standing docs promise that it's limited to 16 bytes B. Since 4.18 that hasn't been true for (some of) the kernel's own processes, where the limit's been 64. C. Was the part of "A" where a limit was documented at all a good idea in retrospect? D. If "C"'s a "no" (which seems to be the consensus) what should the docs say? E. I hadn't mentioned this before, but the docs for prctl()'s PR_SET_NAME document the same 16 byte limit. I think the current behavior since 4.18 is a broken userspace promise, although admittedly a minor/obscure one. I think even if going forward the documentation is deliberately ambiguous about it, it would make sense to briefly document the 16 and 64 byte limits as past limits, to at least help to explain why current code parsing "/proc/*/stat" seems to be confident in those (or more commonly, the 16 bytes). The code I wrote was rather anal about that promise, but e.g. looking at htop(1)'s source code they've got a total limit of 2048 for this sort of line (MAX_READ). I'm sure if I went fishing I could find other similar cases (and probably some lower ones). I don't think it would be good to just leave it ambiguous for those trying to use this interface. They might assume any of 16 bytes (from finding the prctl() PR_SET_NAME docs), 64 bytes (from reading kernel sources), 255 (maximum filename length) etc. Wouldn't the least bad thing be to: * Cover "A" and "B" in passing, i.e. explain past promised / implemented limits. * Note that this is no guarantee, but that... * ...we might use up to N, where N is some sane limit (1024? 2048? 4096?). So programs that parse this now could just increase their fixed buffers, rather than having to use some getc()/realloc() loop, as they might if the interface makes no promises about an upper bound, and if they're being paranoid about future-proofing the parser. If so I have no opinion on what value of "N" would be sane, other than it seems best to pick something. ? > On 12/23/22 18:59, Ævar Arnfjörð Bjarmason wrote: >> diff --git a/man5/proc.5 b/man5/proc.5 >> index 115c8592e..b23dd1479 100644 >> --- a/man5/proc.5 >> +++ b/man5/proc.5 >> @@ -2092,9 +2092,13 @@ The filename of the executable, in > parentheses. Tools such as >> may alternatively (or additionally) use >> .IR/proc/ pid /cmdline. >> .IP >> -Strings longer than >> +For userspace, strings longer than >> .B TASK_COMM_LEN >> (16) characters (including the terminating null byte) are silently truncated. >> +Since Linux version 4.18.0 a longer limit of 64 (including the >> +terminating null byte) has applied to the kernel's own workqueue >> +workers (whose names start with "kworker/"). >> +.IP >> This is visible whether or not the executable is swapped out. >> .TP >> (3) \fIstate\fP \ %c