Joe Damato wrote: > On Sat, Jan 27, 2024 at 11:20:51AM -0500, Willem de Bruijn wrote: > > Joe Damato wrote: > > > Greetings: > > > > > > Welcome to v3. Cover letter updated from v2 to explain why ioctl and > > > adjusted my cc_cmd to try to get the correct people in addition to folks > > > who were added in v1 & v2. Labeled as net-next because it seems networking > > > related to me even though it is fs code. > > > > > > TL;DR This builds on commit bf3b9f6372c4 ("epoll: Add busy poll support to > > > epoll with socket fds.") by allowing user applications to enable > > > epoll-based busy polling and set a busy poll packet budget on a per epoll > > > context basis. > > > > > > This makes epoll-based busy polling much more usable for user > > > applications than the current system-wide sysctl and hardcoded budget. > > > > > > To allow for this, two ioctls have been added for epoll contexts for > > > getting and setting a new struct, struct epoll_params. > > > > > > ioctl was chosen vs a new syscall after reviewing a suggestion by Willem > > > de Bruijn [1]. I am open to using a new syscall instead of an ioctl, but it > > > seemed that: > > > - Busy poll affects all existing epoll_wait and epoll_pwait variants in > > > the same way, so new verions of many syscalls might be needed. It > > > > There is no need to support a new feature on legacy calls. Applications have > > to be upgraded to the new ioctl, so they can also be upgraded to the latest > > epoll_wait variant. > > Sure, that's a fair point. I think we could probably make reasonable > arguments in both directions about the pros/cons of each approach. > > It's still not clear to me that a new syscall is the best way to go on > this, and IMO it does not offer a clear advantage. I understand that part > of the premise of your argument is that ioctls are not recommended, but in > this particular case it seems like a good use case and there have been > new ioctls added recently (at least according to git log). > > This makes me think that while their use is not recommended, they can serve > a purpose in specific use cases. To me, this use case seems very fitting. > > More of a joke and I hate to mention this, but this setting is changing how > io is done and it seems fitting that this done via an ioctl ;) > > > epoll_pwait extends epoll_wait with a sigmask. > > epoll_pwait2 extends extends epoll_pwait with nsec resolution timespec. > > Since they are supersets, nothing is lots by limiting to the most recent API. > > > > In the discussion of epoll_pwait2 the addition of a forward looking flags > > argument was discussed, but eventually dropped. Based on the argument that > > adding a syscall is not a big task and does not warrant preemptive code. > > This decision did receive a suitably snarky comment from Jonathan Corbet [1]. > > > > It is definitely more boilerplate, but essentially it is as feasible to add an > > epoll_pwait3 that takes an optional busy poll argument. In which case, I also > > believe that it makes more sense to configure the behavior of the syscall > > directly, than through another syscall and state stored in the kernel. > > I definitely hear what you are saying; I think I'm still not convinced, but > I am thinking it through. > > In my mind, all of the other busy poll settings are configured by setting > options on the sockets using various SO_* options, which modify some state > in the kernel. The existing system-wide busy poll sysctl also does this. It > feels strange to me to diverge from that pattern just for epoll. I think the stateful approach for read is because there we do want to support all variants: read, readv, recv, recvfrom, recvmsg, recvmmsg. So there is no way to pass it directly. That said, I don't mean to argue strenously for this API or against yours. Want to make sure the option space is explored. There does not seem to be much other feedback. I don't hold a strong opinion either. > In the case of epoll_pwait2 the addition of a new syscall is an approach > that I think makes a lot of sense. The new system call is also probably > better from an end-user usability perspective, as well. For busy poll, I > don't see a clear reasoning why a new system call is better, but maybe I am > still missing something. > > > I don't think that the usec fine grain busy poll argument is all that useful. > > Documentation always suggests setting it to 50us or 100us, based on limited > > data. Main point is to set it to exceed the round-trip delay of whatever the > > process is trying to wait on. Overestimating is not costly, as the call > > returns as soon as the condition is met. An epoll_pwait3 flag EPOLL_BUSY_POLL > > with default 100us might be sufficient. > > > > [1] https://lwn.net/Articles/837816/ > > Perhaps I am misunderstanding what you are suggesting, but I am opposed to > hardcoding a value. If it is currently configurable system-wide and via > SO_* options for other forms of busy poll, I think it should similarly be > configurable for epoll busy poll. > > I may yet be convinced by the new syscall argument, but I don't think I'd > agree on imposing a default. The value can be modified by other forms of > busy poll and the goal of my changes are to: > - make epoll-based busy poll per context > - allow applications to configure (within reason) how epoll-based busy > poll behaves, like they can do now with the existing SO_* options for > other busy poll methods. Okay. I expected some push back. Was curious if people would come back with examples of where the full range is actually being used. > > > seems much simpler for users to use the correct > > > epoll_wait/epoll_pwait for their app and add a call to ioctl to enable > > > or disable busy poll as needed. This also probably means less work to > > > get an existing epoll app using busy poll. > >