Search Linux Wireless

Re: [PATCH v2 ath-next 2/2] wifi: ath11k: fix HTC rx insufficient length

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 12, 2025 at 09:11:45AM +0800, Miaoqing Pan wrote:
> On 3/11/2025 11:20 PM, Jeff Johnson wrote:
> > On 3/11/2025 1:29 AM, Miaoqing Pan wrote:
> >> On 3/10/2025 6:09 PM, Johan Hovold wrote:
> >>> I'm still waiting for feedback from one user that can reproduce the
> >>> ring-buffer corruption very easily, but another user mentioned seeing
> >>> multiple zero-length descriptor warnings over the weekend when running
> >>> with this patch:
> >>>
> >>> 	ath11k_pci 0006:01:00.0: rxed invalid length (nbytes 0, max 2048)
> >>>
> >>> Are there ever any valid reasons for seeing a zero-length descriptor
> >>> (i.e. unrelated to the race at hand)? IIUC the warning would only be
> >>> printed when processing such descriptors a second time (i.e. when
> >>> is_desc_len0 is set).
> >>
> >> That's exactly the logic, only can see the warning in a second time. For
> >> the first time, ath12k_ce_completed_recv_next() returns '-EIO'.
> > 
> > That didn't answer Johan's first question:
> > Are there ever any valid reasons for seeing a zero-length descriptor?
>
> The events currently observed are all firmware logs. The discarded 
> packets will not affect normal operation. I will adjust the logging to 
> debug level.

That still does not answer the question whether there are ever any valid
reasons for seeing zero-length descriptors. I assume there are none?

> > We have an issue that there is a race condition where hardware updates the
> > pointer before it has filled all the data. The current solution is just to
> > read the data a second time. But if that second read also occurs before
> > hardware has updated the data, then the issue isn't fixed.
>  
> Thanks for the addition.
> 
> > So should there be some forced delay before we read a second time?
> > Or should we attempt to read more times?
> 
> The initial fix was to keep waiting for the data to be ready. The 
> observed phenomenon is that when the second read fails, subsequent reads 
> will continue to fail until the firmware's CE2 ring is full and triggers 
> an assert after timeout. However, this situation is relatively rare, and 
> in most cases, the second read will succeed. Therefore, adding a delay 
> or multiple read attempts is not useful.

The proposed fix is broken since ath11k_hal_ce_dst_status_get_length()
not just reads the length but also sets it to zero so that the updated
length may indeed never be seen.

I've taken a closer look at the driver and it seems like we're missing a
read barrier to make sure that the updated descriptor is not read until
after the head pointer.

Miaoqing, could you try the below patch with your reproducer and see if
it is enough to fix the corruption?

If so I can resend with the warning removed and include a corresponding
fix for ath12k (it looks like there are further places where barriers
are missing too).

Johan



[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux