Daniel Mack <daniel@xxxxxxxxxx> writes: > On Friday, May 18, 2018 01:28 PM, Kalle Valo wrote: >> Daniel Mack <daniel@xxxxxxxxxx> writes: >> >>> On Wednesday, May 16, 2018 04:08 PM, Daniel Mack wrote: >>>> Hence I believe that some sort of firmware internal buffer is overrun if >>>> too many SMD requests fly in in a short amount of time. The firmware >>>> does, however, still ack all packets just fine on the SMD channels, and >>>> also the DXE communication flows are all healthy. No errors are reported >>>> anywhere, but nothing is being put on the ether anymore. >>> >>> And FTR, there is a commit in the prima repository that caught my >>> attention a while back: >>> >>> https://source.codeaurora.org/external/wlan/prima/commit/?id=93cd8f3c >>> >>> What this does (through an remarkable number of indirection layers) is >>> sending the DUMP_COMMAND_REQ command with args = (274, 0, 0, 0, 0) >>> when management frames get stuck, which smells pretty much like the >>> issue I'm seeing. Doing the same with the mainline driver and the >>> debugfs interface it exposes doesn't have any effect though. >>> >>> But even if it did work, I wouldn't see a way to detect the situation >>> in which this is needed reliably. >> >> The firmware version might make a difference so I recommend always >> mentioning the firmware version as well. For example, what if your >> firmware does not support that command or parameter? > > Sure, that could be the case. FTR - the firmware I'm using is the one > that came out of the Qualcomm r1034.2.1 BSP. It is recognized by the > driver as 'WCN v2.0 RadioPhy vRhea_GF_1.12 with 19.2MHz XO'. Ok, thanks. Please add that to the bug report. >> Also I would recommend to file a bug to bugzilla.kernel.org so that all >> the information is one place and it can be easily updated. Now it's >> pretty difficult to get the big picture from various emails on the list. > > Yes, I agree it's a bit convoluted. However, there's already the bug > report on 96board.org that Bjorn opened some time back, and I > considered that sufficient. IMO, it has all the information needed, > plus a link to a tool to reproduce the issue. > > https://bugs.96boards.org/show_bug.cgi?id=538 Yeah, bugs.96boards.org is fine. As long as there's one place which collects all the information about the bug. But IMHO the bug report is not telling much, all I get is that TX frames get stuck but not even that is confirmed. After reading it I have at least these questions: * Is it really confirmed that the issue is that TX frames are stuck? For example, using a wireless sniffer would confirm that. * Are only management frames stuck or does it also involve data frames? * Based on the bug report the TX stuck issue seems to happen during authentication, but what happens before that? Does wcn36xx get disconnected from AP or what? * Any wcn36xx logs about the issue (with or without debug logs)? Also matching wpasupplicant logs would help. * Does this only happen with encryption or also in open mode? * How long does it take with qconnman-stress to reproduce the issue? * Does the radio environment make any difference on reproducibility? For example, clear enviroment vs lots of traffic/interference? -- Kalle Valo