Re: [RFC PATCH 2/3] add statmnt(2) syscall

Matthew House <mattlloydhouse@xxxxxxxxx> · Mon, 18 Sep 2023 20:37:42 -0400

On Mon, Sep 18, 2023 at 11:39 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> Okay, so there are now (at least) two buffers, and on overflow the
> caller cannot know which one got overflown.  It can resize both, but
> that doesn't make the caller any simpler to implement.
>
> Also the interface is kind of weird in that some struct members are
> out, some are in (the pointers and the lengths).
>
> I'd prefer the single buffer interface, which has none of the above issues.
>
> Thanks,
> Miklos

One natural solution is to set either of the two lengths to the expected
size if the provided buffer are too small. That way, the caller learns both
which of the buffers is too small, and how large they need to be. Replacing
a provided size with an expected size in this way already has precedent in
existing syscalls:

recvmsg(2):
    The msg argument points to an in/out struct msghdr, and msg->msg_name
    points to an optional buffer which receives the source address. If
    msg->msg_namelen is less than the actual size of the source address,
    the function truncates the address to that length before storing it in
    msg->msg_name; otherwise, it stores the full address. In either case,
    it sets msg->msg_namelen to the full size of the source address before
    returning.

(An address buffer size is similarly provided directly as an in/out pointer
in accept(2), accept4(2), getpeername(2), getsockname(2), and recvfrom(2).)

name_to_handle_at(2):
    The handle argument points to an in/out struct file_handle, followed by
    a variable-length char array. If handle->handle_bytes is too small to
    store the opaque handle, the function returns -EOVERFLOW; otherwise,
    it succeeds. In either case, it sets handle->handle_bytes to the size
    of the opaque handle before returning.

perf_event_open(2):
    The attr argument points to an in/out struct perf_event_attr. If
    attr->size is not a valid size for the struct, the function sets it to
    the latest size and returns -E2BIG.

sched_setattr(2):
    The attr argument points to an in/out struct sched_attr. If attr->size
    is not a valid size for the struct, the function sets it to the latest
    size and returns -E2BIG.

The specific pattern of returning the actual size of the strings both on
success and on failure, as with recvmsg(2) and name_to_handle_at(2), is
beneficial for callers that want to copy the strings elsewhere without
having to scan for the null byte. (Also, it would work well if we ever
wanted to return variable-size binary data, such as arrays of structs.)

Indeed, if we returned the actual size of the string, we could even take a
more radical approach of never setting a null byte after the data, leaving
the caller to append its own null byte if it really wants one. But perhaps
that would be taking it a bit too far; I just don't want this API to end up
in an awful situation like strncpy(3) or struct sockaddr_un, where the
buffer is always null-terminated except in one particular edge case. Also,
if we include a null byte in the returned size, it could invite off-by-one
errors in callers that just expect it to be the length of the string.

Meanwhile, if this solution of in/out size fields were adopted, then
there'd still be the question of what to do when a provided size is too
small: should the returned string be truncated (indicating the issue only
by the returned size being greater than the provided size), or should the
entire call fail with an -EOVERFLOW? IMO, the former is strictly more
flexible, since the caller can set a limit on how big a buffer it's willing
to dedicate to any particular string, and it can still retrieve the
remaining data if that buffer isn't quite big enough. But the latter might
be considered a bit more foolproof against callers who don't properly test
for truncation.

Thank you,
Matthew House