[Explicitly CCing Florian Weimer, since he may have some thoughts.]
Hello Eric,
On 8/29/19 5:50 PM, Eric Blake wrote:
The Austin Group is considering standardizing a subset of the Linux
fcntl(F_GETOWN_EX), because of its ability to overcome the limitation
that fcntl(F_GETOWN) must fail for some valid pids if pid_t is permitted
to be wider than int (whether or not Linux ever reaches a point where
pid_t is wider than int, POSIX did not want to make that restriction on
other implementations). See http://austingroupbugs.net/view.php?id=1274
However, we've run into a minor issue which implies that man-pages
and/or glibc is buggy:
The man page for fcntl() (as of Fedora 30 man-pages-4.16-4.fc30) states:
struct f_owner_ex {
int type;
pid_t pid;
};
but in the headers under /usr/include, there are two different
definitions, which raises the question on what the real type of 'type'
should be:
/usr/include/asm-generic/fcntl.h (from kernel-headers-5.2.9-200.fc30):
struct f_owner_ex {
int type;
__kernel_pid_t pid;
};
/usr/include/bits/fcntl-linux.h (from glibc-headers-2.29-15.fc30):
struct f_owner_ex
{
enum __pid_type type; /* Owner type of ID. */
__pid_t pid; /* ID of owner. */
};
Note that an enum instead of an int matters as to whether this will
complain when compiled:
struct f_owner_ex s;
int *foo = &s.type;
Therefore, we want to confirm whether requiring the eventual POSIX
definition to use enum f_pid_type (as currently worded in
austingroupbugs.net/view.php?id=1274#c4536) is okay (in which case,
there is a bug in the man page for documenting int instead of enum
f_pid_type), or if POSIX should not bother defining enum f_pid_type (and
instead just provide F_OWNER_PID and F_OWNER_PGRP as macros) with
f_owner_ex being defined with an int (in which case, the glibc <fcntl.h>
header needs a change to use int, and the Austin Group proposal needs to
be tweaked to match).
So, a little background.
The kernel feature was added in Linux 2.6.32, which was tagged in
December 2009.
I added the manual page text at the start of October 2009, based on
the types used in the kernel structure.
By chance, the glibc structure definition was added at the end of the
same month. (I do not recall, but I suspect that I did not notice
the glibc addition.)
I do not know what the rationale was for the addition of the 'enum',
and it wouldn't surprise me if there was no public discussion about
it. The use of an 'enum' strikes me as a slightly odd decision (given
that the kernel uses 'int') but, related to your point below, there
is precedent in, for example, the use of an 'enum' for 'idtype_t' in
waitid() inside glibc, while the kernel type for the argument in
the underlying system call is 'int'.
Note that the use of an enum in a public struct makes that struct
dependent on ABI issues (if the library is compiled with one set of
compiler flags where enums occupy the space of 'int', but an application
compiles with a different set of flags where an enum occupies only the
space of 'char', this could result in the application being unable to
correctly call into libc), if that helps sway the decision on which of
the two projects needs to change. However, the exact layout of the
struct and any padding space was not deemed to be a showstopper (that
is, similar to struct stat, the standard intends only to require that at
least two members be present in f_owner_ex without any further
restrictions on what layout those two members occupy).
A side note was also raised during discussion: POSIX already
standardizes the type idtype_t for use in waitid(), and on Linux, we
happen to have P_PID==F_OWNER_PID==1 and P_PGID==F_OWNER_PGRP==2 (which
are the only values that POSIX is considering adding), which on the
surface looks like unnecessary duplication. So at one point, the
question was raised whether POSIX should reuse the existing idtype_t
instead of inventing something new for f_owner_ex. However, it was then
pointed out that idtype_t also includes P_ALL (which on Linux is 0), and
that Linux uses F_OWNER_TID==0 as an extension to what POSIX would
require, but since Linux' F_OWNER_TID semantics for F_SETOWN_EX are not
the same semantics as P_ALL in waitid(); furthermore, <fcntl.h> has free
reign to add more F_* into the namespace but not P_*, where reuse of the
idtype_t type would then require dragging in the <sys/wait.h> header
just to populate f_owner_ex. Thus, this reuse of types was deemed
unpalatable.
I'm agnostic on whether it's the manual page of glibc that should
be fixed. The ABI issues that you note above are unfortunate, of
course. (Do they not suggest that standard really should use 'int'?)
Cheers,
Michael