Re: [PATCH 0/2] proc.5: note broken v4.18 userspace promise

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 04 Jan 2023 21:59:38 +0100

On Wed, Dec 28 2022, Alejandro Colomar wrote:

> [[PGP Signed Part:Undecided]]
> Hi,
>
> On 12/23/22 19:12, Linus Torvalds wrote:
>> On Fri, Dec 23, 2022 at 10:00 AM Ævar Arnfjörð Bjarmason
>> <avarab@xxxxxxxxx> wrote:
>>>
>>> Whereas the fix here is a fix for a promise we're currently making
>>> which hasn't been true since v4.18.
>> Hah. Ack. Did anybody ever actually notice?
>> I wonder if the newer limit of 64 characters for kworkers shouldn't
>> even be mentioned at all, and if the 16-byte truncation for user space
>> should also be just removed.
>> Those limits should never have been some documented API, they were
>> always just implementation details, after all.
>>               Linus
>

Sorry about the late reply, holidays.

> I agree.  A variable implementation detail like this doesn't provide
> anything valuable to users; especially since there's no statbility
> promise at all.  I'd rewrite to just remove the (16) implementation
> detail.
>
> Ævar, would you send an v2 that removes implementation details, rather
> than fixing the details?

Maybe, because I'm not sure I'm qualified to document this anymore. My
current patch just extends the description to cover the 4.18 divergence.

Let's separate a few things here:

 A. The long-standing docs promise that it's limited to 16 bytes
 B. Since 4.18 that hasn't been true for (some of) the kernel's own
    processes, where the limit's been 64.
 C. Was the part of "A" where a limit was documented at all a good idea
    in retrospect?
 D. If "C"'s a "no" (which seems to be the consensus) what should the
    docs say?
 E. I hadn't mentioned this before, but the docs for prctl()'s
    PR_SET_NAME document the same 16 byte limit.

I think the current behavior since 4.18 is a broken userspace promise,
although admittedly a minor/obscure one.

I think even if going forward the documentation is deliberately
ambiguous about it, it would make sense to briefly document the 16 and
64 byte limits as past limits, to at least help to explain why current
code parsing "/proc/*/stat" seems to be confident in those (or more
commonly, the 16 bytes).

The code I wrote was rather anal about that promise, but e.g. looking at
htop(1)'s source code they've got a total limit of 2048 for this sort of
line (MAX_READ). I'm sure if I went fishing I could find other similar
cases (and probably some lower ones).

I don't think it would be good to just leave it ambiguous for those
trying to use this interface. They might assume any of 16 bytes (from
finding the prctl() PR_SET_NAME docs), 64 bytes (from reading kernel
sources), 255 (maximum filename length) etc.

Wouldn't the least bad thing be to:

 * Cover "A" and "B" in passing, i.e. explain past promised /
   implemented limits.
 * Note that this is no guarantee, but that...
 * ...we might use up to N, where N is some sane limit (1024? 2048?
   4096?).

   So programs that parse this now could just increase their fixed
   buffers, rather than having to use some getc()/realloc() loop, as
   they might if the interface makes no promises about an upper bound,
   and if they're being paranoid about future-proofing the parser.

   If so I have no opinion on what value of "N" would be sane, other
   than it seems best to pick something.

?

> On 12/23/22 18:59, Ævar Arnfjörð Bjarmason wrote:
>> diff --git a/man5/proc.5 b/man5/proc.5
>> index 115c8592e..b23dd1479 100644
>> --- a/man5/proc.5
>> +++ b/man5/proc.5
>> @@ -2092,9 +2092,13 @@ The filename of the executable, in
>   parentheses. Tools such as
>>   may alternatively (or additionally) use
>>   .IR/proc/  pid /cmdline.
>>   .IP
>> -Strings longer than
>> +For userspace, strings longer than
>>   .B TASK_COMM_LEN
>>   (16) characters (including the terminating null byte) are silently truncated.
>> +Since Linux version 4.18.0 a longer limit of 64 (including the
>> +terminating null byte) has applied to the kernel's own workqueue
>> +workers (whose names start with "kworker/").
>> +.IP
>>   This is visible whether or not the executable is swapped out.
>>   .TP
>>   (3) \fIstate\fP \ %c