Re: [PATCH] iio: adc: ti_am335x_adc: fix fifo overrun recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 15/03/17 18:59, Michael Engl wrote:
> On Mon, 2017-03-13 at 21:22 +0000, Jonathan Cameron wrote:
>> On 11/03/17 19:32, Michael Engl wrote:
>>>
>>> On Sam, 2017-03-11 at 18:07 +0000, Jonathan Cameron wrote:
>>>>
>>>> On 10/03/17 13:57, Michael Engl wrote:
>>>>>
>>>>>
>>>>> The tiadc_irq_h(int irq, void *private) function is handling
>>>>> FIFO
>>>>> overruns by clearing flags, disabling and enabling the ADC to
>>>>> recover.
>>>>>
>>>>> If the ADC is running in continuous mode a FIFO overrun happens
>>>>> regularly. If the disabling of the ADC happens concurrently
>>>>> with
>>>>> a new conversion. It might happen that the enabling of the ADC
>>>>> is ignored by the hardware. This stops the ADC permanently. No
>>>>> more interrupts are triggered.
>>>>>
>>>>> According to the AM335x Reference Manual (SPRUH73H October 2011
>>>>> -
>>>>> Revised April 2013 - Chapter 12.4 and 12.5) it is necessary to
>>>>> check the ADC FSM bits in REG_ADCFSM before enabling the ADC
>>>>> again. Because the disabling of the ADC is done right after the
>>>>> current conversion has been finished.
>>>> The relevant comment in the latest version of that manual (rev O)
>>>> is at the end of 12.3.5
>>>> "The FIFO can also generate FIFOx_OVERRUN and FIFOx_UNDERFLOW
>>>> interrupts. The user can mask
>>>> these events by programming the IRQENABLE_CLR register. To clear
>>>> a
>>>> FIFO underflow or FIFO overrun
>>>> interrupt, the user should write a ‘1’ to the status bit in the
>>>> IRQSTS register. The TSC_ADC_SS does not
>>>> recover from these conditions automatically. Therefore, the
>>>> software
>>>> will need to disable and then again
>>>> enable the TSC_ADC_SS. Before the user can turn the module back
>>>> on,
>>>> the user must read the
>>>> ADCSTAT register to check if the status of the ADC FSM is idle
>>>> and
>>>> the current step is the idle step."
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>> To trigger this bug it is necessary to run the ADC in
>>>>> continuous
>>>>> mode. The ADC values of all channels need to be read in an
>>>>> endless
>>>>> loop. The bug appears within the first 6 hours (~5.4 million
>>>>> handled FIFO overruns). The user space application will hang on
>>>>> reading new values from the character device.
>>>> Impressive to track this one down, but it's the sort of nasty
>>>> hardware race that we need to be very careful with when trying
>>>> to get a fix in place.
>>>>>
>>>>>
>>>>>
>>>>> Signed-off-by: Michael Engl <michael.engl@xxxxxxxxxxxxxxxxx>
>>>>> ---
>>>>>  drivers/iio/adc/ti_am335x_adc.c | 13 ++++++++++++-
>>>>>  1 file changed, 12 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/iio/adc/ti_am335x_adc.c
>>>>> b/drivers/iio/adc/ti_am335x_adc.c
>>>>> index ad9dec3..4282cec 100644
>>>>> --- a/drivers/iio/adc/ti_am335x_adc.c
>>>>> +++ b/drivers/iio/adc/ti_am335x_adc.c
>>>>> @@ -169,7 +169,9 @@ static irqreturn_t tiadc_irq_h(int irq,
>>>>> void
>>>>> *private)
>>>>>  {
>>>>>  	struct iio_dev *indio_dev = private;
>>>>>  	struct tiadc_device *adc_dev = iio_priv(indio_dev);
>>>>> -	unsigned int status, config;
>>>>> +	unsigned int status, config, adc_fsm;
>>>>> +	unsigned short count = 0;
>>>>> +
>>>>>  	status = tiadc_readl(adc_dev, REG_IRQSTATUS);
>>>>>  
>>>>>  	/*
>>>>> @@ -183,6 +185,15 @@ static irqreturn_t tiadc_irq_h(int irq,
>>>>> void
>>>>> *private)
>>>>>  		tiadc_writel(adc_dev, REG_CTRL, config);
>>>>>  		tiadc_writel(adc_dev, REG_IRQSTATUS,
>>>>> IRQENB_FIFO1OVRRUN
>>>>>  				| IRQENB_FIFO1UNDRFLW |
>>>>> IRQENB_FIFO1THRES);
>>>>> +
>>>>> +		/* wait for idle state.
>>>> Standard multiline comment syntax preferred.
>>>>>
>>>>>
>>>>> +		 * ADC needs to finish the current conversion
>>>>> +		 * before disabling the module
>>>>> +		 */
>>>>> +		do {
>>>>> +			adc_fsm = tiadc_readl(adc_dev,
>>>>> REG_ADCFSM);
>>>>> +		} while (adc_fsm != 0x10 && count++ < 100);
>>>> What slightly bothers me is that we are papering over the
>>>> problem,  yet
>>>> this is far from guaranteed to fix it (I think!) as we might get
>>>> a
>>>> race
>>>> as it starts the next conversion after we have checked, but
>>>> before we
>>>> disable...
>>>>
>>>> Note I'm only looking at this fairly superficially so please
>>>> point
>>>> out if I have missed something!
>>> Probably my comment is a bit confusing. The ADC has already been
>>> disabled by writing the config some lines above the while loop. 
>>>
>>> "
>>> config &= ~(CNTRLREG_TSCSSENB);
>>> tiadc_writel(adc_dev, REG_CTRL, config);
>>> "
>>> But as stated in the Reference Manual the ADC needs some time to
>>> finish
>>> the current conversion to finally disable the ADC. 
>>>
>>> Description of "Enable" Bit in the CTRL Register (table 12-14):
>>> "TSC_ADC_SS module enable bit.
>>> After programming all the steps and configuration registers, write
>>> a
>>> 1to this bit to turn on TSC_ADC_SS.
>>> Writing a 0 will disable the module (after the current
>>> conversion)."
>>>
>>> That means once the adc_fsm is equal to 0x10 (idle state) the ADC
>>> stays
>>> disabled as long as we enable the ADC again. Which is done by the
>>> original code one line below of this patch.
>> Cool. Thanks for the clarification.  I got lost in the datahsheet ;)
>>
>> Would you mind figuring out a fixes tag for this one as I want it to
>> go for stable?
> 
> From 93ed6e1d4cfd34f30e0201407c92ff8af129a689 Mon Sep 17 00:00:00 2001
> From: Michael Engl <michael.engl@xxxxxxxxxxxxxxxxx>
> Date: Wed, 15 Mar 2017 19:45:39 +0100
> Subject: [PATCH] iio: adc: ti_am335x: fix fifo overrun recovery
> 
> The tiadc_irq_h(int irq, void *private) function is handling FIFO
> overruns by clearing flags, disabling and enabling the ADC to
> recover.
> 
> If the ADC is running in continuous mode a FIFO overrun happens
> regulaly. If the disabling of the ADC happens concurrently with
> a new conversion. It might happen that the enabling of the ADC
> is ignored by the hardware. This stops the ADC permanently. No
> more interrupt are triggered.
> 
> According to the AM335x Reference Manual (SPRUH73H October 2011 -
> Revised April 2013 Chapter 12.4 and 12.5) it is necessary to
> check the ADC FSM bits in REG_ADCFSM before enabling the ADC
> again. Because the disabling of the ADC is done right after the
> current conversion has been finished.
> 
> To trigger this bug it is necessary to run the ADC in continuous
> mode. The ADC values of all channels need to be read in an endless
> loop. The bug appears within the first 6 hours (~5.4 million
> handled FIFO overruns). The user space application will hang on
> reading new values from the char device.
> 
> Fixes: ca9a563805f7a ("iio: ti_am335x_adc: Add continuous sampling
> support")
> Signed-off-by: Michael Engl <michael.engl@xxxxxxxxxxxxxxxxx>
As a general rule please don't paste a patch inside another thread.
It makes it really fiddly to apply.  All I was after was the fixes
line! 

Anyhow, thanks for tracking this down.

Applied to the fixes-togreg branch of iio.git.

Thanks,

Jonathan
> ---
>  drivers/iio/adc/ti_am335x_adc.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iio/adc/ti_am335x_adc.c
> b/drivers/iio/adc/ti_am335x_adc.c
> index ad9dec30bb30..752e92ffce94 100644
> --- a/drivers/iio/adc/ti_am335x_adc.c
> +++ b/drivers/iio/adc/ti_am335x_adc.c
> @@ -169,7 +169,9 @@ static irqreturn_t tiadc_irq_h(int irq, void
> *private)
>  {
>  	struct iio_dev *indio_dev = private;
>  	struct tiadc_device *adc_dev = iio_priv(indio_dev);
> -	unsigned int status, config;
> +	unsigned int status, config, adc_fsm;
> +	unsigned short count = 0;
> +
>  	status = tiadc_readl(adc_dev, REG_IRQSTATUS);
>  
>  	/*
> @@ -183,6 +185,15 @@ static irqreturn_t tiadc_irq_h(int irq, void
> *private)
>  		tiadc_writel(adc_dev, REG_CTRL, config);
>  		tiadc_writel(adc_dev, REG_IRQSTATUS,
> IRQENB_FIFO1OVRRUN
>  				| IRQENB_FIFO1UNDRFLW |
> IRQENB_FIFO1THRES);
> +
> +		/* wait for idle state.
> +		   ADC needs to finish the current conversion
> +		   before disabling the module
> +		*/
> +		do {
> +			adc_fsm = tiadc_readl(adc_dev, REG_ADCFSM);
> +		} while (adc_fsm != 0x10 && count++ < 100);
> +
>  		tiadc_writel(adc_dev, REG_CTRL, (config |
> CNTRLREG_TSCSSENB));
>  		return IRQ_HANDLED;
>  	} else if (status & IRQENB_FIFO1THRES) {
> -- 
> 2.7.4
> 
>>
>> Thanks,
>>
>> Jonathan
>>>
>>>
>>>>
>>>>
>>>> We could look for a transition, which would potentially reduce
>>>> the
>>>> chance
>>>> of problems, but that's nasty as well and we'd need to disable
>>>> interrupts
>>>> probably to avoid any potential of a clash (for a non trivial
>>>> amount
>>>> of
>>>> time...)
>>>>>
>>>>>
>>>>> +
>>>>>  		tiadc_writel(adc_dev, REG_CTRL, (config |
>>>>> CNTRLREG_TSCSSENB));
>>>>>  		return IRQ_HANDLED;
>>>>>  	} else if (status & IRQENB_FIFO1THRES) {
> N�����r��y���b�X��ǧv�^�)޺{.n�+����{��*"��^n�r���z���h����&���G���h�(�階�ݢj"���m�����z�ޖ���f���h���~�mml==
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-iio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Input]     [Linux Kernel]     [Linux SCSI]     [X.org]

  Powered by Linux