On 11/03/17 19:32, Michael Engl wrote: > On Sam, 2017-03-11 at 18:07 +0000, Jonathan Cameron wrote: >> On 10/03/17 13:57, Michael Engl wrote: >>> >>> The tiadc_irq_h(int irq, void *private) function is handling FIFO >>> overruns by clearing flags, disabling and enabling the ADC to >>> recover. >>> >>> If the ADC is running in continuous mode a FIFO overrun happens >>> regularly. If the disabling of the ADC happens concurrently with >>> a new conversion. It might happen that the enabling of the ADC >>> is ignored by the hardware. This stops the ADC permanently. No >>> more interrupts are triggered. >>> >>> According to the AM335x Reference Manual (SPRUH73H October 2011 - >>> Revised April 2013 - Chapter 12.4 and 12.5) it is necessary to >>> check the ADC FSM bits in REG_ADCFSM before enabling the ADC >>> again. Because the disabling of the ADC is done right after the >>> current conversion has been finished. >> The relevant comment in the latest version of that manual (rev O) >> is at the end of 12.3.5 >> "The FIFO can also generate FIFOx_OVERRUN and FIFOx_UNDERFLOW >> interrupts. The user can mask >> these events by programming the IRQENABLE_CLR register. To clear a >> FIFO underflow or FIFO overrun >> interrupt, the user should write a ‘1’ to the status bit in the >> IRQSTS register. The TSC_ADC_SS does not >> recover from these conditions automatically. Therefore, the software >> will need to disable and then again >> enable the TSC_ADC_SS. Before the user can turn the module back on, >> the user must read the >> ADCSTAT register to check if the status of the ADC FSM is idle and >> the current step is the idle step." >> >> >> >>> >>> >>> To trigger this bug it is necessary to run the ADC in continuous >>> mode. The ADC values of all channels need to be read in an endless >>> loop. The bug appears within the first 6 hours (~5.4 million >>> handled FIFO overruns). The user space application will hang on >>> reading new values from the character device. >> Impressive to track this one down, but it's the sort of nasty >> hardware race that we need to be very careful with when trying >> to get a fix in place. >>> >>> >>> Signed-off-by: Michael Engl <michael.engl@xxxxxxxxxxxxxxxxx> >>> --- >>> drivers/iio/adc/ti_am335x_adc.c | 13 ++++++++++++- >>> 1 file changed, 12 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/iio/adc/ti_am335x_adc.c >>> b/drivers/iio/adc/ti_am335x_adc.c >>> index ad9dec3..4282cec 100644 >>> --- a/drivers/iio/adc/ti_am335x_adc.c >>> +++ b/drivers/iio/adc/ti_am335x_adc.c >>> @@ -169,7 +169,9 @@ static irqreturn_t tiadc_irq_h(int irq, void >>> *private) >>> { >>> struct iio_dev *indio_dev = private; >>> struct tiadc_device *adc_dev = iio_priv(indio_dev); >>> - unsigned int status, config; >>> + unsigned int status, config, adc_fsm; >>> + unsigned short count = 0; >>> + >>> status = tiadc_readl(adc_dev, REG_IRQSTATUS); >>> >>> /* >>> @@ -183,6 +185,15 @@ static irqreturn_t tiadc_irq_h(int irq, void >>> *private) >>> tiadc_writel(adc_dev, REG_CTRL, config); >>> tiadc_writel(adc_dev, REG_IRQSTATUS, >>> IRQENB_FIFO1OVRRUN >>> | IRQENB_FIFO1UNDRFLW | >>> IRQENB_FIFO1THRES); >>> + >>> + /* wait for idle state. >> Standard multiline comment syntax preferred. >>> >>> + * ADC needs to finish the current conversion >>> + * before disabling the module >>> + */ >>> + do { >>> + adc_fsm = tiadc_readl(adc_dev, >>> REG_ADCFSM); >>> + } while (adc_fsm != 0x10 && count++ < 100); >> What slightly bothers me is that we are papering over the >> problem, yet >> this is far from guaranteed to fix it (I think!) as we might get a >> race >> as it starts the next conversion after we have checked, but before we >> disable... >> >> Note I'm only looking at this fairly superficially so please point >> out if I have missed something! > > Probably my comment is a bit confusing. The ADC has already been > disabled by writing the config some lines above the while loop. > > " > config &= ~(CNTRLREG_TSCSSENB); > tiadc_writel(adc_dev, REG_CTRL, config); > " > But as stated in the Reference Manual the ADC needs some time to finish > the current conversion to finally disable the ADC. > > Description of "Enable" Bit in the CTRL Register (table 12-14): > "TSC_ADC_SS module enable bit. > After programming all the steps and configuration registers, write a > 1to this bit to turn on TSC_ADC_SS. > Writing a 0 will disable the module (after the current conversion)." > > That means once the adc_fsm is equal to 0x10 (idle state) the ADC stays > disabled as long as we enable the ADC again. Which is done by the > original code one line below of this patch. Cool. Thanks for the clarification. I got lost in the datahsheet ;) Would you mind figuring out a fixes tag for this one as I want it to go for stable? Thanks, Jonathan > >> >> We could look for a transition, which would potentially reduce the >> chance >> of problems, but that's nasty as well and we'd need to disable >> interrupts >> probably to avoid any potential of a clash (for a non trivial amount >> of >> time...) >>> >>> + >>> tiadc_writel(adc_dev, REG_CTRL, (config | >>> CNTRLREG_TSCSSENB)); >>> return IRQ_HANDLED; >>> } else if (status & IRQENB_FIFO1THRES) { -- To unsubscribe from this list: send the line "unsubscribe linux-iio" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html