On Thu, 2017-03-23 at 20:42 -0700, Alexander Duyck wrote: > On Thu, Mar 23, 2017 at 6:24 PM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote: > > On Thu, 2017-03-23 at 14:37 -0700, Alexander Duyck wrote: > >> From: Alexander Duyck <alexander.h.duyck@xxxxxxxxx> > >> > > > >> The last bit I changed is to move from using a shift by 10 to just using > >> NSEC_PER_USEC and using multiplication for any run time calculations and > >> division for a few compile time ones. This should be more accurate and > >> perform about the same on most architectures since modern CPUs typically > >> handle multiplication without too much overhead. > > > > > > busy polling thread can be preempted for more than 2 seconds. > > If it is preempted is the timing value even valid anymore? I was > wondering about that. Also when preemption is enabled is there > anything to prevent us from being migrated to a CPU? If so what do we > do about architectures that allow drift between the clocks? > > > Using usec instead of nanoseconds gave us 3 orders of magnitude cushion. > > Yes, but the problem is we also opened up an issue where if the clock > was approaching a roll-over we could add a value to it that would put > us in a state where we would never time out. If you believe there is a bug, send a fix for net tree ? I really do not see a bug, given we use time_after(now, end_time) which handles roll-over just fine. > > > We do not need nsec accuracy for busy polling users, if this restricts > > range and usability under stress. > > Yes and no. So the standard use cases suggest using values of 50 to > 100 microseconds. I suspect that for most people that is probably > what they are using. The addition of preemption kind of threw a > wrench in the works because now instead of spending that time busy > polling you can get preempted and then are off doing something else > for the entire period of time. That is fine. Busy polling heavy users are pinning threads to cpu, and the re-schedule check is basically a way to make sure ksoftirqd will get a chance to service BH. > > What would you think of changing this so that instead of tracking the > total time this function is active, instead we tracked the total time > we spent with preemption disabled? What I would do is move the > start_time configuration to just after the preempt_disable() call. > Then if we end up yielding to another thread we would just reset the > start_time when we restarted instead of trying to deal with all the > extra clock nonsense that we would have to deal with otherwise since I > don't know if we actually want to count time where we aren't actually > doing anything. In addition this would bring us closer to how NAPI > already works since it essentially will either find an event, or if we > time out we hand it off to the softirq which in turn can handle it or > hand it off to softirqd. The only item that might be a bit more > difficult to deal with then would be the way the times are used in > fs/select.c but I don't know if that is really the right way to go > anyway. With the preemption changes and such it might just make sense > to drop those bits and rely on just the socket polling alone. > > The other option is to switch over everything from using unsigned long > to using uint64_t and time_after64. Then we can guarantee the range > needed and then some, but then we are playing with a u64 time value on > 32b architectures which might be a bit more expensive. Even with that > though I still need to clean up the sysctl since it doesn't make sense > to allow negative values for the busy_poll usec to be used which is > currently the case. > > Anyway let me know what you think and I can probably spin out a new > set tomorrow. Leave usec resolution, I see no point switching to nsec, as long we support 32bit kernels. If you believe min/max values should be added to the sysctls, because we do not trust root anymore, please send patches only addressing that. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html