Helmut Schaa schrieb: > On Tue, Jan 10, 2012 at 9:03 AM, Andreas Hartmann > <andihartmann@xxxxxxxxxxxxxxx> wrote: >> Meanwhile, I took a look at the tx_path with Felix patch applied. I'm >> getting this output: >> >> >> -> netperf start >> >> [38139.839028] Open BA session requested for 00:25:9c:de:4e:30 tid 0 >> [38139.848151] IEEE80211_AMPDU_TX_START >> [38139.848157] activated addBA response timer on tid 0 >> [38139.849723] switched off addBA timer for tid 0 >> [38139.849728] Aggregation is on for tid 0 >> >> -> stall >> >> [38161.686305] rt2x00lib_txdone - no success - low_level_stats.dot11ACKFailureCount: 1 >> [38161.698957] rt2x00lib_txdone - no success - low_level_stats.dot11ACKFailureCount: 2 >> >> [38161.698963] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> >> [38161.699116] ieee80211_check_pending_bar() -> ieee80211_send_bar() true tid: 0 failed_bar_ssn: 65040 >> [38161.710238] rt2x00lib_txdone - no success - low_level_stats.dot11ACKFailureCount: 3 >> [38161.710240] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> >> [38161.710394] ieee80211_check_pending_bar() -> ieee80211_send_bar() true tid: 0 failed_bar_ssn: 65040 >> [38161.724512] rt2x00lib_txdone - no success - low_level_stats.dot11ACKFailureCount: 4 >> [38161.724519] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> >> [38161.724659] ieee80211_check_pending_bar() -> ieee80211_send_bar() true tid: 0 failed_bar_ssn: 65040 >> [38161.735416] rt2x00lib_txdone - no success - low_level_stats.dot11ACKFailureCount: 5 >> [38161.735423] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> >> [38161.735486] ieee80211_check_pending_bar() -> ieee80211_send_bar() true tid: 0 failed_bar_ssn: 65040 >> [38161.747815] rt2x00lib_txdone - no success - low_level_stats.dot11ACKFailureCount: 6 >> [38161.747822] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> >> [38161.747888] ieee80211_check_pending_bar() -> ieee80211_send_bar() true tid: 0 failed_bar_ssn: 65040 >> [38161.755268] rt2x00lib_txdone - no success - low_level_stats.dot11ACKFailureCount: 7 >> [38161.755271] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> >> [38161.758823] ieee80211_check_pending_bar() -> ieee80211_send_bar() true tid: 0 failed_bar_ssn: 65040 >> [38161.772898] rt2x00lib_txdone - no success - low_level_stats.dot11ACKFailureCount: 8 >> [38161.772901] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> >> [38161.773049] ieee80211_check_pending_bar() -> ieee80211_send_bar() true tid: 0 failed_bar_ssn: 65040 >> [38161.789222] rt2x00lib_txdone - no success - low_level_stats.dot11ACKFailureCount: 9 >> ... >> >> Looks as if sending of any package is reported as broken at some >> point of time and this circle cannot be left anymore. > > Not necessarily since mac80211 will only retry the BAR if a data frame > transmission was successful. Ok - this can't be seen in the trace as I just traced channel 1 - there could be some data transfer outside of channel 1. But if I look at the debug output, I fear, that at some point the tx is broken completely as I can see at the beginning of each stall solely tx failures. > Hence, it seems as if only the BARs TX > status is reported incorrectly ... Seems so. Because each BAR can be seen in the trace and is responded by the STA. > I see two issues here: > 1) rt2800pci seems to have problems delivering the BAR _or_ doesn't > report the tx status correctly I think, that there is a problem with the tx status, as if I can see the BAR in the trace and even the response of the STA (so the BAR was probably ok), there really seems to be a problem with the tx status. > 2) If the same BAR fails consecutively we should maybe really tear down > the BA session as it was done before > > 2 is done in the below untested patch and will also work around 1. > Mind to give it a try? Yes :-) The workaround patch works fine for me (at a first glance - I didn't check for any side effects yet). The stall doesn't happen any more. But I'm not really sure, why it works: At the beginning, you just can see 9 tx failures. After a few tx failures, you can see 7 BAR's, which are pending, but never passed to ieee80211_send_bar(). - Why are they suddenly ok (are they really ok)? Or is it a logging problem (too much input at the same time)? At session 38176, you can see your patch at work :-). [27229.720767] rt2800_txdone_entry - tx failed [27229.720791] rt2800_txdone_entry - tx failed [27229.720800] rt2800_txdone_entry - tx failed [27229.720807] rt2800_txdone_entry - tx failed [27229.720814] rt2800_txdone_entry - tx failed [27229.720821] rt2800_txdone_entry - tx failed [27229.720828] rt2800_txdone_entry - tx failed [27229.720835] rt2800_txdone_entry - tx failed [27229.727130] rt2800_txdone_entry - tx failed [27229.727133] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> ssn: <38064> [27229.734419] rt2800_txdone_entry - tx failed [27229.734431] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> ssn: <38080> [27229.741731] rt2800_txdone_entry - tx failed [27229.741741] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> ssn: <38096> [27229.749097] rt2800_txdone_entry - tx failed [27229.749107] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> ssn: <38112> [27229.758136] rt2800_txdone_entry - tx failed [27229.758143] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> ssn: <38128> [27229.765453] rt2800_txdone_entry - tx failed [27229.765461] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> ssn: <38144> [27229.771836] rt2800_txdone_entry - tx failed [27229.771843] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> ssn: <38160> [27229.779281] rt2800_txdone_entry - tx failed [27229.779290] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> ssn: <38176> [27229.779315] ieee80211_check_pending_bar -> ieee80211_send_bar() retry cnt: 2 tid: 0 failed_bar_ssn: 38176 [27229.786705] rt2800_txdone_entry - tx failed [27229.786714] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> ssn: <38176> [27229.793439] ieee80211_check_pending_bar -> ieee80211_send_bar() retry cnt: 1 tid: 0 failed_bar_ssn: 38176 [27229.793452] ieee80211_check_pending_bar -> ieee80211_stop_tx_ba_session() retry cnt: 0 tid: 0 failed_bar_ssn: 38176 [27229.793476] Tx BA session stop requested for 00:25:9c:11:22:33 tid 0 [27229.802082] Stopping Tx BA session for 00:25:9c:11:22:33 tid 0 [27229.807455] rt2800_txdone_entry - tx failed [27229.807465] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> ssn: <38176> [27229.816023] Open BA session requested for 00:25:9c:11:22:33 tid 0 [27229.834077] activated addBA response timer on tid 0 [27229.835517] switched off addBA timer for tid 0 [27229.835522] Aggregation is on for tid 0 The following test is done with reduced debugging (without the tx failed entries). But I can see the set_bar_pending calls without a send_bar, too. This block is typical and comes up about 6 times a minute (not regularly). [29007.375460] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> ssn: <4000> [29007.384394] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> ssn: <4016> [29007.391329] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> ssn: <4032> [29007.398646] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> ssn: <4048> [29007.409470] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> ssn: <4064> [29007.415295] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> ssn: <4080> [29007.425943] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> ssn: <4096> [29007.439950] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> ssn: <4112> [29007.440041] ieee80211_check_pending_bar -> ieee80211_send_bar() retry cnt: 2 tid: 0 failed_bar_ssn: 4112 [29007.440061] ieee80211_check_pending_bar -> ieee80211_send_bar() retry cnt: 1 tid: 0 failed_bar_ssn: 4112 [29007.440071] ieee80211_check_pending_bar -> ieee80211_stop_tx_ba_session() retry cnt: 0 tid: 0 failed_bar_ssn: 4112 [29007.440100] Tx BA session stop requested for 00:25:9c:11:22:33 tid 0 [29007.447797] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> ssn: <4112> [29007.449024] Stopping Tx BA session for 00:25:9c:11:22:33 tid 0 [29007.461327] status.c ieee80211_tx_status() calls ieee80211_set_bar_pending. tid: <0> control: <4> ssn: <4112> [29007.490023] Open BA session requested for 00:25:9c:11:22:33 tid 0 [29007.499048] activated addBA response timer on tid 0 [29007.504198] switched off addBA timer for tid 0 [29007.504203] Aggregation is on for tid 0 Thank you for your time - I really appreciate your work! Kind regards, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html