On Mon, 2018-09-03 at 13:11 +0200, Toke Høiland-Jørgensen wrote: > > 6 vs. 8, I think? But I didn't follow the full discussion. Err, I just realized that I was completely wrong - the default, of course, is 10. So smaller values mean more buffering. Most of my argumentation was thus utter garbage :-) > > I also think that we shouldn't necessarily restrict this to "for the > > ath10k". Is there any reason to think that this would be different for > > different chips? > > No, I also think it should be possible to select a value that will work > for all drivers. There's a strong diminishing returns effect here after > a certain point, and I believe we should strongly push back against just > adding more buffering to chase (single-flow) throughput numbers; and I'm > not convinced that having different values for different drivers is > worth the complexity. I think I can see some point in it if the driver requires some buffering for some specific reason? But you'd have to be able to state that reason, I guess, I could imagine having a firmware limitation to need to build two aggregates, or something around MU-MIMO, etc. > As far as I'm concerned, just showing an increase in maximum throughput > under ideal conditions (as this initial patch submission did) is not > sufficient argument for adding more buffering. At a minimum, the > evaluation should also include: > > - Impact on latency and throughput for competing flows in the same test. > - Impact on latency for the TCP flow itself. > - A full evaluation at low rates and in scenarios with more than one > client. That seems reasonable. > As far as the actual value, I *think* it may be that the default shift > should be 7 (8 ms) rather than 8 (4 ms) as it currently is. Going back > and looking at my data from when I submitted the original patch, it > looks like the point of diminishing returns is somewhere between those > two with ath9k (where I did most of my testing), and it seems reasonable > that it could be slightly higher (i.e., more buffering) for ath10k. Grant's data shows a significant difference between 6 and 7 for both latency and throughput: * median tpt - ~241 vs ~201 (both 1 and 5 streams) * median latency - 7.5 vs 6 (1 stream) - 17.3 vs. 16.6 (5 streams) A 20% throughput improvement at <= 1.5ms latency cost seems like a pretty reasonable trade-off? johannes