On Mon, Sep 3, 2018 at 6:35 AM Toke Høiland-Jørgensen <toke@xxxxxxx> wrote: > > Johannes Berg <johannes@xxxxxxxxxxxxxxxx> writes: .... > > Grant's data shows a significant difference between 6 and 7 for both > > latency and throughput: Minor nit: this is wgong's data more thoughtfully processed. > > > > * median tpt > > - ~241 vs ~201 (both 1 and 5 streams) > > * median latency > > - 7.5 vs 6 (1 stream) > > - 17.3 vs. 16.6 (5 streams) > > > > A 20% throughput improvement at <= 1.5ms latency cost seems like a > > pretty reasonable trade-off? > > Yeah, on it's face. What I'm bothered about is that it is the exact > opposite results that I got from my ath10k tests (there, throughput > *dropped* and latency doubled when going to from 4 to 16 ms of > buffering). Hrm, yeah...that would bother me too. I think even if we don't understand why/how that happened, at some level we need to allow subsystems or drivers to adjust sk_pacing_shift value. Changing sk_pacing_shift clearly has an effect that can be optimized. If smaller values of sk_pacing_shift is increasing the interval (and allows more buffering), I'm wondering why CPU utilization gets higher. More buffering is usually more efficient. :/ wgong: can you confirm (a) I've entered the data correctly in the spreadsheet and (b) you've labeled the data sets correctly when you generated the data? If either of us made a mistake, it would be good to know. :) I'm going to speculate that "other processing" (e.g. device level interrupt mitigation or possibly CPU scheduling behaviors which handles TX/RX completion) could cause a "bathtub" effect similar to the performance graphs that originally got NAPI accepted into the kernel ~15 years ago. So the "optimal" value could be different for different architectures and different IO devices (which have different max link rates and different host bus interconnects). But honestly, I don't understand all the details of sk_pacing_shift value nearly as well as just about anyone else reading this thread. > And, well, Grant's data is from a single test in a noisy > environment where the time series graph shows that throughput is all over > the place for the duration of the test; so it's hard to draw solid > conclusions from (for instance, for the 5-stream test, the average > throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7 > it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware > used in this test, so I can't go verify it myself; so the only thing I > can do is grumble about it here... :) It's a fair complaint and I agree with it. My counter argument is the opposite is true too: most ideal benchmarks don't measure what most users see. While the data wgong provided are way more noisy than I like, my overall "confidence" in the "conclusion" I offered is still positive. cheers, grant