Re: [PATCH RFC v2 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 02/05 08:44, Michael Kerrisk (man-pages) wrote:
> Hello Fam Zheng,
> 
> On 02/05/2015 02:52 AM, Fam Zheng wrote:
> > On Wed, 02/04 13:44, Michael Kerrisk (man-pages) wrote:
> >> Hello Fam Zheng,
> >>
> >> On 02/04/2015 11:36 AM, Fam Zheng wrote:
> >>> Changes from v1 (https://lkml.org/lkml/2015/1/20/189):
> >>>
> >>>   - As discussed in previous thread [1], split the call to epoll_ctl_batch and
> >>>     epoll_pwait. [Michael]
> >>>
> >>>   - Fix memory leaks. [Omar]
> >>>
> >>>   - Add a short comment about the ignored copy_to_user failure. [Omar]
> >>>
> >>>   - Cover letter rewritten.
> >>>
> >>> This adds two new system calls as described below. I mentioned glibc wrapping
> >>> of sigarg in epoll_pwait1 description, so don't read it as end user man page
> >>> yet.
> >>
> >> Fair enough. But I think it would be helpful to say a little more already.
> >> See my comment below.
> >>
> >>> One caveat is the possible failure of the last copy_to_user in epoll_ctl_batch,
> >>> which returns per command error code. Ideas to improve that are welcome!
> >>>
> >>> 1) epoll_ctl_batch
> >>> ------------------
> >>>
> >>> NAME
> >>>        epoll_ctl_batch - modify an epoll descriptor in batch
> >>>
> >>> SYNOPSIS
> >>>
> >>>        #include <sys/epoll.h>
> >>>
> >>>        int epoll_ctl_batch(int epfd, int flags,
> >>>                            int ncmds, struct epoll_ctl_cmd *cmds);
> >>>
> >>> DESCRIPTION
> >>>
> >>>        The system call performs a batch of epoll_ctl operations. It allows
> >>>        efficient update of events on this epoll descriptor.
> >>>
> >>>        Flags is reserved and must be 0.
> >>>
> >>>        Each operation is specified as an element in the cmds array, defined as:
> >>>
> >>>            struct epoll_ctl_cmd {
> >>>
> >>>                   /* Reserved flags for future extension, must be 0. */
> >>>                   int flags;
> >>>
> >>>                   /* The same as epoll_ctl() op parameter. */
> >>>                   int op;
> >>>
> >>>                   /* The same as epoll_ctl() fd parameter. */
> >>>                   int fd;
> >>>
> >>>                   /* The same as the "events" field in struct epoll_event. */
> >>>                   uint32_t events;
> >>>
> >>>                   /* The same as the "data" field in struct epoll_event. */
> >>>                   uint64_t data;
> >>>
> >>>                   /* Output field, will be set to the return code after this
> >>>                    * command is executed by kernel */
> >>>                   int error_hint;
> >>
> >> Why 'error_hint', rather than just stay 'status' or 'result'? I mean
> >> is it really a "hint"? Also, it can be 0 meaning "success" (no error).
> > 
> > Because the convention is weak here: the copy_to_user, to assign values to this
> > field in user provided array, could (very unlikely) fail, at which point the
> > commands are already executed so there is no return. This corner case
> > inconsistency makes it hard to be a contract.
> 
> I still think it's a poor name. Yes, it could unlikely fail.
> Still, 'status' or 'result_status' or something like that is better.
> 
> >>>            };
> >>>
> >>>        Commands are executed in their order in cmds, and if one of them failed,
> >>>        the rest after it are not tried.
> >>>
> >>>        Not that this call isn't atomic in terms of updating the epoll
> >>>        descriptor, which means a second epoll_ctl or epoll_ctl_batch call
> >>>        during the first epoll_ctl_batch may make the operation sequence
> >>>        interleaved. However, each single epoll_ctl_cmd operation has the same
> >>>        semantics as a epoll_ctl call.
> >>
> >> (Good to include that warning!)
> >>
> >>> RETURN VALUE
> >>>
> >>>        If one or more of the parameters are incorrect, -1 is returned and errno
> >>>        is set appropriately. Otherwise, the number of succeeded commands is
> >>>        returned.
> >>>
> >>>        Each error_hint field may be set to the error code or 0, depending on
> >>>        the result of the command. If there is some error in returning the error
> >>>        to user, it may also be unchanged, even though the command may actually
> >>>        be executed. In this case, it's still ensured that the i-th (i = ret)
> >>>        command is the failed command.
> >>
> >> Sorry -- I'm not clear here. Can you say some more here? What sort
> >> of error might there be when "returning the error to the user"?
> > 
> > As explained above. See also the last comment of Omar:
> > 
> > https://lkml.org/lkml/2015/1/21/103
> 
> Okay -- but could you please actually explain this in more detail in the 
> man page. Then others do not need to ask the same question.

OK, I'll name it 'status' and add more details in next revision.

> 
> > <...>
> > 
> >>> DESCRIPTION
> >>>
> >>>        The epoll_pwait1 system call differs from epoll_pwait only in parameter
> >>>        types. The first difference is timeout, a pointer to timespec structure
> >>>        which allows nanosecond presicion; the second difference, which should
> >>>        probably be wrapper by glibc and only expose a sigset_t pointer as in
> >>>        pselect6.
> >>
> >> Here I think it still helps to explain that 'struct sigags' is a sigset_t* +
> >> the size of the pointed-to set.
> > 
> > Yes, will add.
> > 
> >>
> >>>        If timeout is NULL, it's treated as if 0 is specified in epoll_pwait
> >>
> >> The convention I would expect here is that NULL means infinite timeout,
> >> like select(), and timeout == {0,0} would get the "return immediately"
> >> behavior. Why did you choose your convention? And, how does one otherwise 
> >> request an infinite timeout?
> > 
> > I'm wrong. I should follow the conventions of pselect and ppoll. Thanks for
> > catching that.
> 
> Good.
> 
> >>
> >>>        (return immediately). Otherwise it's converted to nanosecond scalar,
> >>>        again, with the same convention as epoll_pwait's timeout.
> >>>
> >>> RETURN VALUE
> >>>
> >>>        The same as said in epoll_pwait(2).
> >>>
> >>> ERRORS
> >>>
> >>>        The same as said in man epoll_pwait(2), plus:
> >>>
> >>>        EINVAL flags is not zero.
> >>>
> >>>
> >>> CONFORMING TO
> >>>
> >>>        epoll_pwait1() is Linux-specific.
> >>>
> >>> SEE ALSO
> >>>
> >>>        epoll_create(2), epoll_ctl(2), epoll_wait(2), epoll_pwait(2), epoll(7)
> >>
> >> In your earlier patch set, there was the ability to select the clock
> >> used for timeouts. Why did this go away? I'm not sure whether we need
> >> that functionality or not, but it would be good to know why you
> >> dropped it this time.
> >>
> > 
> > No room for another "int clockid". Options:
> 
> Ahh yes. I overlooked that. But again, if your commit message or
> the man page said something about why you dropped this argument,
> then we would not run around in circles having the same 
> conversations repeatedly.
> 
> I keep loving this recent quote from akpm:
> http://lwn.net/Articles/616226/

Good point!

> 
> > 0) Don't have it.
> > 
> > 1) "Or" into lower bits of flags.
> > 
> > 2) Encode into flags bits.
> > 
> > 3) Squash "int clockid" in the last argument (currently "sig" but need to name
> > it "spec", again).
> 
> Well, it's up to you. In https://lkml.org/lkml/2015/1/20/189
> you initially proposed to have the clockid. Do you need it,
> or not? I am agnostic about it. If you do decide you want it,
> then I think (3) is the best option.

Then let's do it this way. clockid would be useful because different
applications care about different clocks.

> 
> I'm very pleased that you expand this man page as you go.
> But, for the next iteration, I think it would be better to make
> it even more complete--it really is helpful to have documentation
> as a starting point to discuss the API, and the better the doc,
> the better the discussion.
> 

Sure, I'll take your points. Thanks for the advice!

Fam
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux