Search Linux Wireless

Re: [PATCH] b43: Rewrite DMA Tx status handling sanity checks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday 23 November 2009 05:45:47 Larry Finger wrote:
> On 11/19/2009 03:24 PM, Michael Buesch wrote:
> > This rewrites the error handling policies in the TX status handler.
> > It tries to be error-tolerant as in "try hard to not crash the machine".
> > It won't recover from errors (that are bugs in the firmware or driver),
> > because that's impossible. However, it will return a more or less useful
> > error message and bail out. It also tries hard to use rate-limited messages
> > to not flood the syslog in case of a failure.
> 
> This patch definitely helped open-source firmware, but it is not a complete fix.

It is no fix _at_ _all_.
The patch does not change a single line of code that wasn't either an assertion
or a machine crash before.
So it just transforms assertions into more verbose assertions and crashes into
assertions without a crash.

> debug: Out of order TX status report on DMA ring 1. Expected 114, but got 146

Ok, this is what I expected.

Let's see what's going on. Here's the ring. o is unused, * is used.

ooooooooooooooo***************************************************ooooooooooooooooooooooooooo
               ^               ^                                 ^
               114             146                               newest
               oldest

So as you can see, the firmware reported a TX status for a frame right in the middle of
the ringbuffer. The new code detects this now before getting a double free and/or silent
memory corruption (freeing of used memory).

It really is illegal to report a TX status for a frame that's not the oldest one in the ring.
The firmware is required to process all frames in-order on one ring.

So how can this failure happen? I think there basically are three ways this can happen.

- First is that the ordering within one ring really gets messed up and it loses track
  of its ring pointers. I'm not sure if this is likely. Probably not.
- It messes up the ring membership. So it reports a TX status on the wrong ring.
  Note that the "ring" kernel pointer in the TX status report handler is derived
  from the cookie (and so also the number in the message "Out of order TX status
  report on DMA ring 1" is derived from the cookie). So it's untrustworthy in case of
  broken firmware. The firmware has QoS-alike mechanisms, even if QoS is disabled. Maybe
  these mechanisms are broken.
- Third is the possibility of a driver bug. I rule that out as long as nobody is
  able to reproduce it with proprietary firmware.

-- 
Greetings, Michael.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]
  Powered by Linux