Search Linux Wireless

Re: [RFC] p54pci: skb_over_panic, soft lockup, stall under flood

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/11/2009 09:28 AM, Quintin Pitts wrote:
> Hi,
> 
> Sorry for my lack of experience in all aspects - first time
> submitting!!!

Everyone that goes through this "right of passage" gets somewhat
discouraged by the response. My advice is to hang in.

My first advice is for you to run every submitted patch through the
check at scripts/checkpatch.pl. This one shows 95 errors and 7
warnings in 136 lines. Most of the errors are due to "DOS line
endings". We really hate carriage returns - a really useless occupier
of space unless it is _NOT_ followed by \n!

As I understand it, this patch is to fix the driver to work around
firmware errors. If that is correct, please state that clearly. If
only partially correct, then indicate which parts are to fix firmware
errors, and which are to fix driver errors. Has your analysis included
thinking about where the driver might delay to avoid firmware problems.

> In trying to get p54pci driver to be stable on my platform and hardware
> - here is a generic patch that seems to accomplish that.  Since the
> ViewSonic V210 uses the IT8152 pci bridge - some attention was needed to
> get dma related allocation in the first physical 64M.  I have verified
> that the dma related allocation is in the first 64M and dmabounce is not
> being used - just for those wondering if that was part of the problems.
> 
> Platform: ViewSonic V210 arm pxa255
> Kernel 2.6.30.5 eabi
> Wireless Drivers from compat-wireless-2009-09-30 and what I applied the below patch to.
> Firmware used: FW rev 2.13.12.0 - Softmac protocol 5.9
> 
> Wireless card: GemTek WL-850FJB minipci card.
> 
> phy0: p54 detected a LM86 firmware
> p54: rx_mtu reduced from 3240 to 2376
> phy0: FW rev 2.13.12.0 - Softmac protocol 5.9
> phy0: cryptographic accelerator WEP:YES, TKIP:YES, CCMP:YES
> phy0: hwaddr 00:90:4b:c1:06:bc, MAC:isl3890 RF:Frisbee
> phy0: Selected rate control algorithm 'minstrel'
> 
> device pci info (lspci -v):
> 
> 00:06.0 Network controller: Intersil Corporation ISL3886 [Prism Javelin/Prism Xbow] (rev 01)
> Subsystem: Intersil Corporation Device 0000
> Flags: bus master, medium devsel, latency 56, IRQ 217
> Memory at 11000000 (32-bit, non-prefetchable) [size=8K]
> Capabilities: [dc] Power Management version 1
> Kernel driver in use: p54pci
> Kernel modules: prism54, p54pci

Mush of the above is useless detail. Stating the device and the
platform should be sufficient.

> Reasons for patch was to solve the below problems.
> 
> 1.  p54p_check_rx_ring - skb_over_panic: Under a ping flood or just left
> running for a bit would panic with a skb_over_panic. Investigation
> showed for some odd reason the device/firmware instead of writing a
> length in the data rx_ring (desc->len) had instead written the whole dma
> address (host->host_addr) into location of the len/flag (host->len and
> host->flags) spot and the same dma address that was in the ring.  Added
> the following condition in p54p_check_rx_ring to trap that condition and
> trim the skb reset the len and flags only.  By the way - I used haret to
> see if it I could prove it happening under wince - located the dma
> memory that was being used for rings - and also happening under windows
> ce with the  len/flag being set to the same as the host dma.  Scanning
> the ring at 1000 times per second (I think)  In a flood or iperf.  Would
> see an occasional len/flag location get set to the same host address in
> that ring - may only happen a few times every minute.  Under normal
> operation maybe a few times a day.
> 
>    if(unlikely(len == (desc->host_addr & 0xffff)
>    && (desc->flags == ((desc->host_addr & 0xffff0000) >> 16))) )
> 
> 2.  p54p_refill_rx_ring - eventual stall: Has the potential in very busy
> (flood) to over run the last rx data processed ring index corrupting the
> next rings - causing some havoc of getting some 13 indexes difference
> between priv->rx_idx_data and ring_control host_idx on a 8 index ring.
> This appears to eventually fill up the TX queue - returning a -ENOSPC in
> p54_assign_address (txrx.c) because of ring corruption missing some TX
> releases.  Changed p54p_refill_rx_ring to take a index parm and use that
> as the last processed ring index - instead of the using the ring_control
> device_idx.
> 
> 3.  p54p_check_rx_ring - eventual stall: On ping flood - Control
> P54_CONTROL_TYPE_TXDONE rx packets that are skb reused - seem to cause a
> problem on the next time around with the same index.   Even though the
> length was not the same was still being seen as a
> P54_CONTROL_TYPE_TXDONE packet again. Side affects varied - one being
> the main end result same as the #2 listed above TX not being released
> and returning a -ENOSPC in p54_assign_address (txrx.c) - stall.
> Problem went away if did not reuse the skb but unmap it and
> dev_kfree_skb if return was zero from p54_rx. Still unclear why this
> would be - but had no problems with patch afterwards.
> 
> 4.  p54p_check_rx_ring - soft lockup in p54p_refill_rx_ring.  This only
> occurred when 5 minute iperf on a fast wireless network - Or 1 to 2 days
> of unit left up.  Discovered that the device had lost it's mind and set
> the ring_control->device_index[ring_index] exactly 0xFF or 255 less than
> it should be (ram issue??) don't know.  Happens on three of my devices
> the same way.  If left to continue - the p54p_refill_rx_ring while loop
> goes negative and soft lockup.  Trap and return if device_idx - (*index)
> greater than ring_index.  Error is only tripped the one time - meaning
> the next time p54p_check_rx_ring is called the device index is back to
> what it should have been.
> 
> 5.  p54p_open   - 1 out of 10 boots will produce device does not
> respond! or Cannot boot firmware!.    Minor - but frustrating all the
> same.
> Always rmmod p54pci and then modprobe p54pci works.  It seems if get a
> error on p54p_open trying again works.  And if p54_read_eeprom fails -
> trying again works.
> 
> The below was applied to compat-wireless-2009-09-30:
> 
> Thanks,
> 
> Quintin.
> 
> Signed-off-by: Quintin Pitts <geek4linux@xxxxxxxxx>
> 
> --- 
> 
> --- a/drivers/net/wireless/p54/p54pci.c	2009-09-29 23:13:58.000000000 -0500
> +++ b/drivers/net/wireless/p54/p54pci.c	2009-10-09 08:15:58.000000000 -0500
> @@ -131,7 +131,7 @@ static int p54p_upload_firmware(struct i
>  
>  static void p54p_refill_rx_ring(struct ieee80211_hw *dev,
>  	int ring_index, struct p54p_desc *ring, u32 ring_limit,
> -	struct sk_buff **rx_buf)
> +	struct sk_buff **rx_buf, u32 index)
>  {
>  	struct p54p_priv *priv = dev->priv;
>  	struct p54p_ring_control *ring_control = priv->ring_control;
> @@ -139,7 +139,11 @@ static void p54p_refill_rx_ring(struct i
>  
>  	idx = le32_to_cpu(ring_control->host_idx[ring_index]);
>  	limit = idx;
> -	limit -= le32_to_cpu(ring_control->device_idx[ring_index]);
> +/*
> + *           Use last processed index instead of device_idx
> + *           so we don't corrupt our ring 
> + */
> +	limit -= le32_to_cpu(index);
>  	limit = ring_limit - limit;
>  
>  	i = idx % ring_limit;
> @@ -181,9 +185,26 @@ static void p54p_check_rx_ring(struct ie
>  	struct p54p_ring_control *ring_control = priv->ring_control;
>  	struct p54p_desc *desc;
>  	u32 idx, i;
> +	int ret;
>  
> +	idx = le32_to_cpu(ring_control->device_idx[ring_index]);
>  	i = (*index) % ring_limit;
> -	(*index) = idx = le32_to_cpu(ring_control->device_idx[ring_index]);
> +	if(unlikely((idx - (*index)) > ring_limit || 
> + (le32_to_cpu(ring_control->host_idx[ring_index]) - (*index)) > ring_limit)) { 

The indentation in this section is strange.

> +  	printk(KERN_DEBUG "%s: devidx jumped *index=%d devidx=%d hostidx=%d ring_limit=%d\n",
> +	__func__,(*index),idx,ring_control->host_idx[ring_index],ring_limit);
> +/* 
> + * Do nothing things are really wrong - device index has jumped got corrupted
> + *  - wait for it to stabilize 
> + * So far device idx exactly 0xFF (255) bytes less than what it should be. 
> + * only seen to happen on very fast wireless and packet floods and/or iperf test
> + * In testing this error only encountered once - so next time around the 
> + * device index is correct.
> + * if to continue would soft lockup/hang in while loop in p54p_refill_rx_ring
> + */
> +		return;
> +		}

This section looks like one where a driver delay might resolve the
problem. I have no objections to your fix. I'm just curious.

> +	(*index) = idx;
>  	idx %= ring_limit;
>  	while (i != idx) {
>  		u16 len;
> @@ -197,25 +218,40 @@ static void p54p_check_rx_ring(struct ie
>  			i %= ring_limit;
>  			continue;
>  		}
> +		if(unlikely(len == (desc->host_addr & 0xffff) 
> +	&& (desc->flags == ((desc->host_addr & 0xffff0000) >> 16))) ) {

You have a whitespace problem (space after if) and again an
indentation problem. Usually, the form is as follows:

                if (unlikely(len == (desc->host_addr & 0xffff) &&
                   (desc->flags....

> +/* device has put device dma in desc len/flag location - will crash in skb_put
> + * desc->len and desc->flags contain the host_addr -
> + * trap before skb_put and discard
> + * ViewSonic V210 and wireless card GENTEK WL-850 , IT8152 PCI bridge 
> + * happens occasionally - no clear reason or frequency.
> + *  
> + */ 
> +		printk(KERN_DEBUG "%s: rx_ring len/flags has address - skipping!\n",__func__); 
> +                  skb_trim(skb,0);
> +		  desc->len = cpu_to_le16(priv->common.rx_mtu + 32);
> +		  desc->flags=0;

There is an indentation problem here.

> +                 
> +		} else {
> +
>  		skb_put(skb, len);
>  
> -		if (p54_rx(dev, skb)) {
> -			pci_unmap_single(priv->pdev,
> +		ret=p54_rx(dev,skb);
> +		pci_unmap_single(priv->pdev,

Here too.

>  					 le32_to_cpu(desc->host_addr),
>  					 priv->common.rx_mtu + 32,
>  					 PCI_DMA_FROMDEVICE);
> -			rx_buf[i] = NULL;
> -			desc->host_addr = 0;
> -		} else {
> -			skb_trim(skb, 0);
> -			desc->len = cpu_to_le16(priv->common.rx_mtu + 32);
> -		}
> +		if(ret==0)
> +			dev_kfree_skb(skb);
> +		rx_buf[i] = NULL;
> +		desc->host_addr = 0;
> +		} /* end of desc->len skb corrupt crash test */
>  
>  		i++;
>  		i %= ring_limit;
>  	}
>  
> -	p54p_refill_rx_ring(dev, ring_index, ring, ring_limit, rx_buf);
> +	p54p_refill_rx_ring(dev, ring_index, ring, ring_limit, rx_buf, (*index));
>  }
>  
>  /* caller must hold priv->lock */
> @@ -428,10 +464,10 @@ static int p54p_open(struct ieee80211_hw
>  	priv->rx_idx_mgmt = priv->tx_idx_mgmt = 0;
>  
>  	p54p_refill_rx_ring(dev, 0, priv->ring_control->rx_data,
> -		ARRAY_SIZE(priv->ring_control->rx_data), priv->rx_buf_data);
> +		ARRAY_SIZE(priv->ring_control->rx_data), priv->rx_buf_data, 0);
>  
>  	p54p_refill_rx_ring(dev, 2, priv->ring_control->rx_mgmt,
> -		ARRAY_SIZE(priv->ring_control->rx_mgmt), priv->rx_buf_mgmt);
> +		ARRAY_SIZE(priv->ring_control->rx_mgmt), priv->rx_buf_mgmt, 0);
>  
>  	P54P_WRITE(ring_control_base, cpu_to_le32(priv->ring_control_dma));
>  	P54P_READ(ring_control_base);
> @@ -550,9 +586,26 @@ static int __devinit p54p_probe(struct p
>  	}
>  
>  	err = p54p_open(dev);
> -	if (err)
> -		goto err_free_common;
> +	if (err) {
> +                
> +		printk(KERN_DEBUG "%s: p54p_open failed - trying again\n",__func__);
> +                msleep(10);
> +		err = p54p_open(dev);
> +		if (err)
> +			goto err_free_common;
> +        }
>  	err = p54_read_eeprom(dev);
> +	if (err)
> +	{
> +                printk(KERN_DEBUG "%s: p54_read_eeprom failed - trying again\n",__func__);
> +		p54p_stop(dev);
> +		err = p54p_open(dev);
> +                if (err)
> +			goto err_free_common;
> +		msleep(10);
> +		err = p54_read_eeprom(dev);
> +             
> +	}
>  	p54p_stop(dev);
>  	if (err)
>  		goto err_free_common;
> 

I will let Christian comment more on the technical merits of the patch
as he understands the device much better than I do.

Larry
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]
  Powered by Linux