Rajkumar Manoharan <rmanohar@xxxxxxxxxxxxxx> writes: > On 2018-11-07 06:53, Toke Høiland-Jørgensen wrote: >> Rajkumar Manoharan <rmanohar@xxxxxxxxxxxxxx> writes: >> >>> Meanwhile we did some more experiments with both modes. The experiment >>> was done in open environment and fixed rate and UDP traffic ran for 60 >>> seconds. >>> >>> Seems like push mode not honoring the configured weight. Always the >>> airtime was almost same whereas in pull-mode airtime is changing based >>> on configured weight. Hence I would like to know your results. >> >> Right, so I verified that the current version of the patch set still >> works with ath9k. However, the ath10k card I have doesn't seem to >> support peer stats, so I can't test ath10k. >> >> $ lspci | grep Qualcomm >> 03:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac >> Wireless Network Adapter >> >> $ cat /sys/kernel/debug/ieee80211/phy1/ath10k/chip_id >> 0x043202ff >> >> $ cat /sys/kernel/debug/ieee80211/phy1/ath10k/wmi_services | grep PEER >> WMI_SERVICE_PEER_CACHING - >> WMI_SERVICE_PEER_STATS - >> > > Oops... Yeah 988x firmware (10.2.4) does not have peer stats support. > >> >> Is there a way to force-enable airtime support, is this a hardware >> issue? >> > Unfortunately not. There is one more pending change that handles > airtime report from HTT tx-compl. Again it depends firmware support. > These experiments are taken with this f/w interface. Will post the > change. Right, thought so. Too bad. Guess you are doing all the ath10k testing, then ;) >>> sta1 sta2 sta3 sta4 >>> pull-mode 8s(205us) 18s(3.2ms) 8s(205us) 14s(410us) >>> 12s(256us) 12s(256us) 13s(256us) 12s(256us) >>> 14s(4ms) 13s(4ms) 14s(4ms) 13s(4ms) >>> >>> push-mode 15s(205us) 12s(3.2ms) 16s(205us) 12s(410us) >>> 15s(256us) 12s(256us) 16s(256us) 12s(256us) >>> 14s(4ms) 13s(4ms) 16s(4ms) 12s(4ms) >> >> Right, so the pull-mode results are encouraging! *Something* is >> happening, at least, even though the aggregate airtime doesn't quite >> match the ratios of the configured weights. >> >> Are you running the UDP generator on the AP itself, or on a separate >> device, BTW? If it's on the AP, the userspace socket can get throttled, >> which will interfere with results, so it's better to have it on a >> separate device (connected via ethernet). >> > Traffic b/w wired client (via ethernet) and wireless clients. Cool. >> As for push-mode, could this be because of bad buffer management? I.e., >> because there's a lag between the time airtime is registered, and the >> time that airtime usage is reported, the driver just pushes a whole >> bunch of packets to the firmware when it gets the chance, which >> prevents >> the scheduler from throttling properly. This could also explain the >> better, but not quite perfect, results in pull mode, assuming that pull >> mode results in better firmware buffer management which reduces, but >> doesn't quite remove, the lag. >> > Hmm... I agree that lag in reporting airtime may cause more data push > to hw. Right now both tx and tx-compl are serialized by > active_txq_lock. So once lock acquired by tx path, it will download > all frames. i.e it is even true for ath9k driver. Hence I am wondering > how it is working only with ath9k. If enough packets are dequeued at once that the TXQ empties and is not put back on the scheduling loop, the next_txq() loop is just going to loop through the remaining TXQs and immediately increase their quantum. In ath9k, there's a maximum of two outstanding aggregates below the TXQ level, but I think you mentioned that ath10k can queue thousands in firmware? Your patch removes the 'max 16 packets at a time' check before the call to ath10k_mac_tx_push_txq(); try adding that back and see if it helps? >> If this is indeed the reason, the queue limit patches should hopefully >> be a solution. So guess we need to get those working as well :) >> > I would prefer to baseline the basic infra into upstream first and do > enhancement on top of that. Sure, I'm fine with merging this first and building on top. > I request you to revisit maintaining per driver default. Otherwise > there would be performance impact with 256us. :( Hmm, how about we make it a driver-specified multiplier instead (which could be 4, 8 or 16 for ath10k)? That way it would still work even if userspace changes the weights. It would affect the minimum possible granularity, but that is probably acceptable; and we would be free to change the mechanism later, without affecting the userspace API. -Toke