Search Linux Wireless

Re: rtl8821ae keep alive not set, connection lost

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 14, 2017 at 07:27:39PM +1000, James Cameron wrote:
> On Wed, Sep 13, 2017 at 07:39:35PM -0500, Larry Finger wrote:
> > On 09/13/2017 04:46 PM, James Cameron wrote:
> > >
> > >I'll give it some more testing and let you know, but it seems as
> > >capable of keeping a connection as 4.13 plus my earlier revert.
> > >
> 
> Testing went well; removing the call to enable ASPM was as good as
> changing the DBI read back to 16-bit width.
> 
> > The change I sent earlier should be as good as reverting the change
> > to write_byte in your reversion.
> 
> Yes, that would be the hope.
> 
> But with the 16-bit DBI read, the register REG_DBI_CTRL+0 is being
> read as well, in the first read in _rtl8821ae_enable_aspm_back_door,
> so perhaps reading that register has an unexpected side-effect.
> 

I've ruled that out after testing for several days different kernels
based on v4.13;

- add an rtl_read_byte of REG_DBI_CTRL+0 in rtl8821ae_hw_init just
  after the call to enable_aspm; does not solve problem,

- add an rtl_read_byte of REG_DBI_CTRL+0 at the start of
  _rtl8821ae_check_pcie_dma_hang; does not solve problem,

Only way to solve the problem at the moment is either;

- reverting 40b368af4b75 ("rtlwifi: Fix alignment issues"), which
  means using rtl_read_word in _rtl8821ae_dbi_read,

or

- removing the two lines that enable ASPM, as you asked me to try.

> Is there any documentation for that register?  I see other code writes
> to REG_DBI_CTRL+3, in _rtl8821ae_check_pcie_dma_hang

I'll repeat and expand on this.  Is there any documentation for this
register, or the other REG_DBI_* registers?

I see that DBI windowed access in rtl8192de is different and yet very
similar.

In rtl8821ae, rtl8723be, and rtl8192de the method seems straightforward;
there are bits for address, bits for write enable by byte, and flag
bits for starting the transfer and completing.

> Evidence of read from REG_DBI_CTRL was captured with an instrumented
> kernel; git diff http://dev.laptop.org/~quozl/y/1dsQ6B.txt yielding
> these dmesg lines;
> 
> [    6.010255] rtl_pci: _rtl_pci_update_default_setting const_amdpci_aspm=03
> [    6.010338] rtl_pci: rtl_pci_enable_aspm
> [    6.034295] ieee80211 phy0: Selected rate control algorithm 'rtl_rc'
> [    6.034806] rtlwifi: rtlwifi: wireless switch is on
> [    6.196958] rtl8821ae 0000:02:00.0 wlp2s0: renamed from wlan0
> [    7.979186] rtl_pci: rtl_pci_disable_aspm
> [    7.979306] rtl8821ae: _rtl8821ae_check_pcie_dma_hang
> [    8.295360] rtl8821ae: _rtl8821ae_enable_aspm_back_door
> [    8.295437] rtl8821ae: _rtl8821ae_dbi_read  070f -> ffff (@034f)
> [    8.295449] rtl8821ae: _rtl8821ae_dbi_write 070f <- ff (@870c)
> [    8.295462] rtl8821ae: _rtl8821ae_dbi_read  0719 -> 0200 (@034d)
> [    8.295474] rtl8821ae: _rtl8821ae_dbi_write 0719 <- 18 (@2718)
> [    8.295477] rtl_pci: rtl_pci_enable_aspm
> [    8.469734] rtl_pci: rtl_pci_disable_aspm
> [    8.469857] rtl8821ae: _rtl8821ae_check_pcie_dma_hang
> [    8.686955] rtl8821ae: _rtl8821ae_enable_aspm_back_door
> [    8.687013] rtl8821ae: _rtl8821ae_dbi_read  070f -> ffff (@034f)
> [    8.687025] rtl8821ae: _rtl8821ae_dbi_write 070f <- ff (@870c)
> [    8.687038] rtl8821ae: _rtl8821ae_dbi_read  0719 -> 0218 (@034d)
> [    8.687050] rtl8821ae: _rtl8821ae_dbi_write 0719 <- 18 (@2718)
> [    8.687053] rtl_pci: rtl_pci_enable_aspm
> 
> Observe how the windowed read of DBI register 0x70f causes a read of
> 16-bits at 0x34f, which includes first 8-bits of 0x350 REG_DBI_CTRL.
> 
> By the way, the cold boot value of DBI register 0x719 is 0x00, and
> the warm boot value is 0x18, so I'm confident there isn't a
> comprehensive register reset.  It means that BIOS has relevance; and
> this BIOS is outside my control.  BIOS variation may explain
> difficulty reproducing.

Is there a register for device reset that I can try?  It would help
to exclude BIOS.

> 
> > There has been a report (in Russian unfortunately) at
> > https://www.linux.org.ru/forum/desktop/12620193 of delays in ARP
> > handling.
> 
> Thanks.  I've considered and excluded ARP handling delay.  Though ARP
> renewal is typical reason for device sleep to end.
> 
> With the call to enable ASPM disabled, instead of changing the DBI
> read to 16-bit width, what happens is that the device stops accepting
> data from the access point, packets are buffered there, and are
> transmitted as soon as the device makes the next transmission.
> 
> http://dev.laptop.org/~quozl/z/1dsQBf.txt has the ping and IP tcpdump
> to confirm this.
> 
> I've a monitor mode tcpdump I can send by private mail if required.
> In that the burst of packets shows ICMP echo requests were buffered by
> the access point.
> 
> > According to Google translate is as follows:
> > 
> > ============================================================
> > Periodically, Wi-Fi networker rtl8821ae ceases to respond to ARP,
> > which causes the Internet to end. Wireshark looks quite interesting:
> > ARP replays can be sent by one large packet a few seconds after
> > receiving the requests, ie. they seem to be buffered somewhere.
> 
> Yes, buffering at access point.
> 
> > I need to explore that ENOBUFS return code.
> 
> I've seen ENOBUFS up at the application level with ping too, when the
> original problem happens with v4.10 plus stable.
> 
> > Your case where the device is unresponsive to pings from another NIC
> > until the device transmits may also be an ARP problem.
> > 
> > For completeness, are you using the 2.4 of 5 GHz band? What is the
> > make/model your AP? If possible for you to determine, what firmware
> > is it running?
> 
> 2.4 GHz and 5 GHz reproduces the problem.
> 
> Open or WPA reproduces the problem.
> 
> Netgear WNDR3800 OpenWrt 12.09-beta, r33312.
> 
> Several other access points reproduce the problem, including a
> customer's TP-Link TL-WR1042ND with unknown firmware version.
> 
> No access point as yet does not reproduce the problem.
> 
> Hope that helps, thanks for your ideas.
> 
> -- 
> James Cameron
> http://quozl.netrek.org/

-- 
James Cameron
http://quozl.netrek.org/



[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux