Hi > -----Original Message----- > From: Grant Grundler [mailto:grundler@xxxxxxxxxxxx] > On Thu, Jan 31, 2019 at 2:36 PM Brian Norris <briannorris@xxxxxxxxxxxx> > wrote: > ... > > On Wed, Jan 30, 2019 at 12:02:10PM +0800, yhchuang@xxxxxxxxxxx wrote: > ... > > > +static void rtw_pci_dma_check(struct rtw_dev *rtwdev, > > > + struct rtw_pci_rx_ring *rx_ring, > > > + u32 idx) > > > +{ > > > + struct rtw_chip_info *chip = rtwdev->chip; > > > + struct rtw_pci_rx_buffer_desc *buf_desc; > > > + u32 desc_sz = chip->rx_buf_desc_sz; > > > + u16 total_pkt_size; > > > + int i; > > > + > > > + buf_desc = (struct rtw_pci_rx_buffer_desc *)(rx_ring->r.head + > > > + idx * desc_sz); > > > + for (i = 0; i < 20; i++) { > > > + total_pkt_size = le16_to_cpu(buf_desc->total_pkt_size); > > > + if (total_pkt_size) > > > + return; > > > + } > > > > > > Umm, what are you trying to do here? This is a non-sensical loop. I > > *imagine* you're trying to do some kind of timeout loop here, > > My guess is this is trying to mitigate a race between the CPU reading > memory and RTL88* device in updating this word of memory. This is the > first function called in the packet processing loop in > rtw_pci_rx_isr(). This code suggests that the CPU is seeing the > interrupt before the RTL has finished updating memory via DMA and is > likely a "Bug" in the RTL device firmware. > > I'm suggesting this is a bug for two reasons: > 1) NIC vendors historically have tried to reduce packet handling > latency by sending the interrupt a few microseconds BEFORE the DMA is > actually completed. This has always failed in some weird way whenever > the PCI host controllers were a bit slower or perhaps even coalescing > DMA writes (or something like that). > > 2) PCI-Express interrupts are "In band" (MSI or not). This means once > the interrupt arrives, the corresponding device driver should be able > to assume all DMA from the device is complete. This race is only > possible for "out of band" IRQ lines which are found on other busses > and historic PCI and PCI-X parallel busses. > > "DMA is complete" doesn't necessarily means data is visible to CPU > though on Intel X86, it usually is. The DMA API requires memory > allocated via pci_[z]alloc_consistent() be visible and cache coherent > though. So I don't expect CPU to perform any action to make the last > bits of DMA'd data visible in this case. "Streaming Data" requires > pci_unmap* calls or respect dma_sync_to_cpu() (or whatever it's > called) to guarantee the data is visible - that's not what is being > accessed in this code though AFAICT. > > > > but since > > there's nothing telling the compiler that this is anything but normal > > memory, this loop gets flattened by the compiler into a single check of > > ->total_pkt_size (I checked; my compiler gets rid of the loop). > > This is another bug in the code actually: the code accessing tx > descriptor rings should be declared volatile since they are updated by > the device which is not visible to the compiler. Compiler won't > optimize (or reorder) volatile code (by "volatile code" I mean code > that is accessing data marked "volatile"). > > > So, at a minimum, you should just remove the loop. But I'm not sure if > > this "check" function has any value at all... > > If Realtek can't fix what appears to be a bug in the firmware, I > suspect a polling loop will be required - but add a "udelay(1)" in the > loop. > > > > > > + > > > + if (i >= 20) > > > + rtw_warn(rtwdev, "pci bus timeout, drop packet\n"); > > > > ...BTW, I'm seeing this get triggered quite a bit. > > > > Do you have some kind of memory mapping/ordering issue or something? I > > wouldn't think you should expect to just drop packets on the floor so > > often like this. > > Based on the code above, I'm willing to bet you are correct the driver > has memory ordering issues. Since the descriptor rings aren't declared > volatile, the compiler is free to reorder accesses to those fields as > it sees fit. This is a real problem given the device may update the > fields in the different order than what the compiler emits as memory > accesses. > This is fixed in PATCH v5. We reset and use rx tag to sync dma. So the polling can be removed and the bus timeout has not happened again. Thanks Yan-Hsuan