Re: [PATCH 1/1] ioctl_epoll.2: New page describing ioctl(2) operations for epoll fds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 06, 2024 at 11:39:58PM +0200, Alejandro Colomar wrote:
> Hi Joe,
> 
> On Tue, Jun 04, 2024 at 06:17:40PM GMT, Joe Damato wrote:
> > Signed-off-by: Joe Damato <jdamato@xxxxxxxxxx>
> 
> Thanks for the patch!  Please see a few comments below.

I've included some questions below just to make sure I give you a v2
that is much closer to correct :)

> 
> > ---
> >  man/man2/epoll_create.2 |   1 +
> >  man/man2/epoll_ctl.2    |   1 +
> >  man/man2/ioctl_epoll.2  | 203 ++++++++++++++++++++++++++++++++++++++++
> >  man/man7/epoll.7        |   1 +
> >  4 files changed, 206 insertions(+)
> >  create mode 100644 man/man2/ioctl_epoll.2
> > 
> > diff --git a/man/man2/epoll_create.2 b/man/man2/epoll_create.2
> > index f0327e8ba..2aa1745f5 100644
> > --- a/man/man2/epoll_create.2
> > +++ b/man/man2/epoll_create.2
> > @@ -141,4 +141,5 @@ on overrun.
> >  .BR close (2),
> >  .BR epoll_ctl (2),
> >  .BR epoll_wait (2),
> > +.BR ioctl_epoll (2),
> >  .BR epoll (7)
> > diff --git a/man/man2/epoll_ctl.2 b/man/man2/epoll_ctl.2
> > index 6d5bc032e..24bbe7405 100644
> > --- a/man/man2/epoll_ctl.2
> > +++ b/man/man2/epoll_ctl.2
> > @@ -425,5 +425,6 @@ flag.
> >  .SH SEE ALSO
> >  .BR epoll_create (2),
> >  .BR epoll_wait (2),
> > +.BR ioctl_epoll (2),
> >  .BR poll (2),
> >  .BR epoll (7)
> > diff --git a/man/man2/ioctl_epoll.2 b/man/man2/ioctl_epoll.2
> > new file mode 100644
> > index 000000000..1d53f458e
> > --- /dev/null
> > +++ b/man/man2/ioctl_epoll.2
> > @@ -0,0 +1,203 @@
> > +.\" Copyright (c) 2024, Joe Damato
> > +.\" Written by Joe Damato <jdamato@xxxxxxxxxx>
> > +.\"
> > +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> > +.\"
> > +.\"
> 
> Please use only one consecutive .\"

Thanks, fixed.
 
> > +.TH ioctl_epoll 2 (date) "Linux man-pages (unreleased)"
> > +.SH NAME
> > +ioctl_epoll \- ioctl() operations for epoll file descriptors
> 
> Please add link pages <man2const/EPIOCSPARAMS.2const> and
> <man2const/EPIOCGPARAMS.2const>.

And after adding those, I'd add those to SEE ALSO and I'd omit the
description of them from the ioctl_epoll.2 page (because they'd have
their own pages) ?
 
> > +.SH LIBRARY
> > +Standard C library
> > +.RI ( libc ", " \-lc )
> > +.SH SYNOPSIS
> > +.nf
> > +.BR "#include <linux/eventpoll.h>" "  /* Definition of " EPIOC* " constants and struct epoll_params */"
> > +.B "#include <sys/ioctl.h>"
> > +.P
> > +.BI "int ioctl(int " fd ", int " op ", void " *argp ");"
> 
> The '*' should be bold; not italics.

Thanks, fixed.

> > +.fi
> > +.SH DESCRIPTION
> > +Various
> > +.BR ioctl (2)
> > +operations can be performed on an epoll file descriptor (created by a call
> > +to
> > +.BR epoll_create (2))
> > +(since Linux 6.9 and glibc 2.40) using calls of the form:
> > +.\" commit 18e2bf0edf4dd88d9656ec92395aa47392e85b61
> > +.\" glibc commit 92c270d32caf3f8d5a02b8e46c7ec5d9d0315158
> > +.P
> > +.in +4n
> > +.EX
> > +ioctl(fd, op, argp);
> > +.EE
> > +.in
> > +.P
> > +In the above,
> > +.I fd
> > +is a file descriptor referring to an epoll file descriptor obtained with a
> > +call to
> > +.BR epoll_create (2).
> > +.I op
> > +is one of the operations listed below, and
> > +.I argp
> > +is a pointer to the data structure described below.
> 
> If argp is a pointer to a structure, then the prototype should document
> that:
> 
> .BI "int ioctl(int " fd ", int " op ", struct epoll_params *" argp );
> 
> Also, I would document the two operations separately:
> 
> .BI "int ioctl(int " fd ", EPIOCSPARAMS, const struct epoll_params *" argp );
> .BI "int ioctl(int " fd ", EPIOCGPARAMS, struct epoll_params *" argp );
> 
> This allows documenting that the 'S' one doesn't modify the argp
> (AFAICS).

Do you mean that I should omit the generic 

 .BI "int ioctl(int " fd ", int " op ", struct epoll_params *" argp );

and instead have the above with the two ops?
 
> > +.\"
> > +.P
> > +All supported
> > +.I op
> > +values (described below) use an
> > +.I argp
> > +argument which is a pointer to a
> > +.I epoll_params
> > +structure, defined as:
> > +.P
> > +.in +4n
> > +.EX
> > +struct epoll_params {
> > +    uint32_t busy_poll_usecs;   /* Number of usecs to busy poll */
> > +    uint16_t busy_poll_budget;  /* Maximum number of packets to retrieve per poll */
> > +    uint8_t prefer_busy_poll;   /* Boolean to enable or disable prefer busy poll  */
> > +
> > +    /* pad the struct to a multiple of 64bits */
> > +    uint8_t __pad;              /* Must be zero */
> > +};
> 
> You could add this type definition to the SYNOPSIS, below the function
> prototypes.

OK, I moved it.

I'm not sure if it is formatted properly, though. I'll see if I can
find other examples of this style to check against.

> > +.EE
> > +.in
> > +.P
> > +The
> > +.I busy_poll_usecs
> > +field denotes the number of microseconds that the network stack will busy
> > +poll. During this time period, the network device will be polled
> > +repeatedly. This value cannot exceed
> > +.B INT_MAX .
> > +.in
> > +.P
> > +The
> > +.I busy_poll_budget
> > +field denotes the maximum number of packets that the network stack will
> > +be retrieved on each poll attempt. This value cannot exceed
> > +.B NAPI_POLL_WEIGHT
> > +which, as of Linux 6.9, is 64, unless the process is run with
> > +.B CAP_NET_ADMIN .
> > +.P
> > +The
> > +.I prefer_busy_poll
> > +field is a boolean field and must be either 0 (disabled) or 1 (enabled). If
> > +enabled, this indicates to the network stack that busy poll is the
> > +preferred method of processing network data and the network stack should
> > +give the application the opportunity to busy poll. Without this option,
> > +very busy systems may continue to do network processing via the normal
> > +method of IRQs triggering softIRQ and NAPI.
> > +.P
> > +The supported
> > +.I op
> > +values:
> > +.TP
> > +.B EPIOCSPARAMS
> > +This operation allows the caller to specify an
> > +.I epoll_params
> > +structure to configure the operation of epoll. Refer to the structure
> > +description of the structure above to learn what configuration is
> > +supported.
> > +.TP
> > +.B EPIOCGPARAMS
> > +This operation allows the caller to retrieve the current
> > +.I epoll_params
> > +structure. This can be used to determine what the current settings are.
> > +.SH RETURN VALUE
> > +On success, 0 is returned.
> > +On failure, \-1 is returned, and
> > +.I errno
> > +is set to indicate the error.
> > +.SH ERRORS
> > +.TP
> > +.B EOPNOTSUPP
> > +The kernel was not compiled with busy poll support.
> > +.TP
> > +.B ENOIOCTLCMD
> 
> Is this a thing?
> 
> 	$ grep -rn ENOIOCTLCMD /usr/include/
> 	$ 
> 
> I suspect this is EINVAL in user space?  (Actually, you list the same
> exact error condition for EINVAL below.)

Good catch!

Yes, I looked at the kernel code, it looks like this
becomes ENOTTY before entering userland, but due to the way the code
was written (I'll double check again) I think I wrote this to return
EINVAL.

> > +The specified
> > +.I op
> > +is invalid.
> > +.TP
> > +.B EINVAL
> > +The
> > +.I fd
> > +specified is not an epoll file descriptor, or the
> > +.I op
> > +specified is invalid, or the
> > +.I __pad
> > +was not initialized to zero, or
> > +.I busy_poll_usecs
> > +exceeds
> > +.B INT_MAX ,
> > +or
> > +.I prefer_busy_poll
> > +is not 0 or 1.
> 
> Please separate the different conditions for EINVAL into separate
> entries.  The ones that are related can go in the same one, but the
> unrelated ones are better split.

Thanks!

I did something like this:

.TP
.B EINVAL
The
.I fd
specified is not an epoll file descriptor.
.TP
.B EINVAL
The
.I op
specified is invalid.
.TP
.B EINVAL
The
.I argp.__pad
was not initialized to zero.

is that what you meant?

> > +.TP
> > +.B EPERM
> > +The process is being run without
> > +.I CAP_NET_ADMIN
> 
> This should be bold.

Thanks, fixed.

> > +and the specified
> > +.I busy_poll_budget
> 
> This should be
> 
> .I argp.busy_poll_budget

Thanks, fixed.

> > +exceeds
> > +.I NAPI_POLL_WEIGHT
> 
> This should be bold.

Thanks, fixed.

> > +(which is 64 as of Linux 6.9).
> > +.TP
> > +.B EFAULT
> > +.I argp
> > +does not point to a valid memory address.
> > +.SH EXAMPLES
> > +.EX
> 
> Could you write an entire program, with a main()?
> 
> If not, it's fine; this is better than nothing.  But we prefer having
> entire programs that users can paste and try.

Hmm.. I can give it a try! It's a bit tricky because to actually use
busy polling, you need to have a program that accepts incoming
connections and calls epoll_wait (after setting the appropriate
values with the ioctl).

I could write something simple that does that but it would be a bit
long, I think.

> > +/* Code to set the epoll params to enable busy polling */
> > +\&
> > +int epollfd = epoll_create1(0);
> > +struct epoll_params params;
> > +\&
> > +if (epollfd == \-1) {
> > +    perror("epoll_create1");
> > +    exit(EXIT_FAILURE);
> > +}
> > +\&
> > +memset(&params, 0, sizeof(struct epoll_params));
> > +\&
> > +params.busy_poll_usecs = 25;
> > +params.busy_poll_budget = 8;
> > +params.prefer_busy_poll = 1;
> > +\&
> > +if (ioctl(epollfd, EPIOCSPARAMS, &params) == \-1) {
> > +    perror("ioctl");
> > +    exit(EXIT_FAILURE);
> > +}
> > +\&
> > +/* Code to show how to retrieve the current settings */
> > +\&
> > +memset(&params, 0, sizeof(struct epoll_params));
> > +\&
> > +if (ioctl(epollfd, EPIOCGPARAMS, &params) == \-1) {
> > +    perror("ioctl");
> > +    exit(EXIT_FAILURE);
> > +}
> > +\&
> > +/* params struct now contains the current parameters */
> > +\&
> > +fprintf(stderr, "epoll usecs: %lu\\n", params.busy_poll_usecs);
> > +fprintf(stderr, "epoll packet budget: %u\\n", params.busy_poll_budget);
> > +fprintf(stderr, "epoll prefer busy poll: %u\\n", params.prefer_busy_poll);
> > +\&
> > +.SH History
> > +Linux 6.9.
> > +glibc 2.40.
> > +.SH SEE ALSO
> > +.BR ioctl (2),
> > +.BR epoll_create (2),
> > +.BR epoll_create1 (2),
> > +.BR epoll (7)
> > +.P
> > +.I Documentation/networking/napi.rst
> > +.P
> > +and
> > +.P
> > +.I Documentation/admin-guide/sysctl/net.rst
> > +.P
> > +in the Linux kernel source tree
> 
> We could document that as
> 
> .I linux.git/Documentation/...
> 
> to not have to say "in the Linux kernel source tree".

OK, I updated the above to be what you suggested.

> > diff --git a/man/man7/epoll.7 b/man/man7/epoll.7
> > index e7892922e..4ad032bdd 100644
> > --- a/man/man7/epoll.7
> > +++ b/man/man7/epoll.7
> > @@ -606,5 +606,6 @@ is present in an epoll instance.
> >  .BR epoll_create1 (2),
> >  .BR epoll_ctl (2),
> >  .BR epoll_wait (2),
> > +.BR ioctl_epoll (2),
> >  .BR poll (2),
> >  .BR select (2)
> > -- 
> > 2.34.1
> > 
> 
> -- 
> <https://www.alejandro-colomar.es/>






[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux