On Fri, Aug 16, 2024 at 01:01:42PM -0400, Willem de Bruijn wrote: > Joe Damato wrote: > > On Fri, Aug 16, 2024 at 10:59:51AM -0400, Willem de Bruijn wrote: [...] > > > Willem de Bruijn wrote: > > > If the only goal is to safely reenable interrupts when the application > > > stops calling epoll_wait, does this have to be user tunable? > > > > > > Can it be either a single good enough constant, or derived from > > > another tunable, like busypoll_read. > > > > I believe you meant busy_read here, is that right? > > > > At any rate: > > > > - I don't think a single constant is appropriate, just as it > > wasn't appropriate for the existing mechanism > > (napi_defer_hard_irqs/gro_flush_timeout), and > > > > - Deriving the value from a pre-existing parameter to preserve the > > ABI, like busy_read, makes using this more confusing for users > > and complicates the API significantly. > > > > I agree we should get the API right from the start; that's why we've > > submit this as an RFC ;) > > > > We are happy to take suggestions from the community, but, IMHO, > > re-using an existing parameter for a different purpose only in > > certain circumstances (if I understand your suggestions) is a much > > worse choice than adding a new tunable that clearly states its > > intended singular purpose. > > Ack. I was thinking whether an epoll flag through your new epoll > ioctl interface to toggle the IRQ suspension (and timer start) > would be preferable. Because more fine grained. I understand why you are asking about this and I think it would be great if this were possible, but unfortunately it isn't. epoll contexts can be associated with any NAPI ID, but the IRQ suspension is NAPI specific. As an example: imagine a user program creates an epoll context and adds fds with NAPI ID 1111 to the context. It then issues the ioctl to set the suspend timeout for that context. Then, for whatever reason, the user app decides to remove all the fds and add new ones, this time from NAPI ID 2222, which happens to be a different net_device. What does that mean for the suspend timeout? It's not clear to me what the right behavior would be in this situation (does it persist? does it get cleared when a new NAPI ID is added? etc) and it makes the user API much more complicated, with many more edge cases and possible bugs. > Also, the value is likely dependent more on the expected duration > of userspace processing? If so, it would be the same for all > devices, so does a per-netdev value make sense? There is presently no way to set values like gro_timeout, defer_hard_irqs, or this new proposed value on a per-NAPI basis. IMHO, that is really where all of these values should live. I mentioned on the list previously (and also in the cover letter), that time permitting, I think the correct evolution of this would be to support per-NAPI settings (via netdev-genl, I assume) so that user programs can set all 3 values on only the NAPIs they care about. Until that functionality is available, it would seem per-netdev is the only way for this feature to be added at the present time. I simply haven't had the time to add the above interface. This feature we're proposing has demonstrable performance value, but it doesn't seem sensible to block it until time permits me to add a per-NAPI interface for all of these values given that we already globally expose 2 of them. That said, I appreciate the thoughtfulness of your replies and I am open to other suggestions. - Joe