On Mon, 2017-03-13 at 21:22 +0000, Jonathan Cameron wrote: > On 11/03/17 19:32, Michael Engl wrote: > > > > On Sam, 2017-03-11 at 18:07 +0000, Jonathan Cameron wrote: > > > > > > On 10/03/17 13:57, Michael Engl wrote: > > > > > > > > > > > > The tiadc_irq_h(int irq, void *private) function is handling > > > > FIFO > > > > overruns by clearing flags, disabling and enabling the ADC to > > > > recover. > > > > > > > > If the ADC is running in continuous mode a FIFO overrun happens > > > > regularly. If the disabling of the ADC happens concurrently > > > > with > > > > a new conversion. It might happen that the enabling of the ADC > > > > is ignored by the hardware. This stops the ADC permanently. No > > > > more interrupts are triggered. > > > > > > > > According to the AM335x Reference Manual (SPRUH73H October 2011 > > > > - > > > > Revised April 2013 - Chapter 12.4 and 12.5) it is necessary to > > > > check the ADC FSM bits in REG_ADCFSM before enabling the ADC > > > > again. Because the disabling of the ADC is done right after the > > > > current conversion has been finished. > > > The relevant comment in the latest version of that manual (rev O) > > > is at the end of 12.3.5 > > > "The FIFO can also generate FIFOx_OVERRUN and FIFOx_UNDERFLOW > > > interrupts. The user can mask > > > these events by programming the IRQENABLE_CLR register. To clear > > > a > > > FIFO underflow or FIFO overrun > > > interrupt, the user should write a ‘1’ to the status bit in the > > > IRQSTS register. The TSC_ADC_SS does not > > > recover from these conditions automatically. Therefore, the > > > software > > > will need to disable and then again > > > enable the TSC_ADC_SS. Before the user can turn the module back > > > on, > > > the user must read the > > > ADCSTAT register to check if the status of the ADC FSM is idle > > > and > > > the current step is the idle step." > > > > > > > > > > > > > > > > > > > > > > > > > To trigger this bug it is necessary to run the ADC in > > > > continuous > > > > mode. The ADC values of all channels need to be read in an > > > > endless > > > > loop. The bug appears within the first 6 hours (~5.4 million > > > > handled FIFO overruns). The user space application will hang on > > > > reading new values from the character device. > > > Impressive to track this one down, but it's the sort of nasty > > > hardware race that we need to be very careful with when trying > > > to get a fix in place. > > > > > > > > > > > > > > > > Signed-off-by: Michael Engl <michael.engl@xxxxxxxxxxxxxxxxx> > > > > --- > > > > drivers/iio/adc/ti_am335x_adc.c | 13 ++++++++++++- > > > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/iio/adc/ti_am335x_adc.c > > > > b/drivers/iio/adc/ti_am335x_adc.c > > > > index ad9dec3..4282cec 100644 > > > > --- a/drivers/iio/adc/ti_am335x_adc.c > > > > +++ b/drivers/iio/adc/ti_am335x_adc.c > > > > @@ -169,7 +169,9 @@ static irqreturn_t tiadc_irq_h(int irq, > > > > void > > > > *private) > > > > { > > > > struct iio_dev *indio_dev = private; > > > > struct tiadc_device *adc_dev = iio_priv(indio_dev); > > > > - unsigned int status, config; > > > > + unsigned int status, config, adc_fsm; > > > > + unsigned short count = 0; > > > > + > > > > status = tiadc_readl(adc_dev, REG_IRQSTATUS); > > > > > > > > /* > > > > @@ -183,6 +185,15 @@ static irqreturn_t tiadc_irq_h(int irq, > > > > void > > > > *private) > > > > tiadc_writel(adc_dev, REG_CTRL, config); > > > > tiadc_writel(adc_dev, REG_IRQSTATUS, > > > > IRQENB_FIFO1OVRRUN > > > > | IRQENB_FIFO1UNDRFLW | > > > > IRQENB_FIFO1THRES); > > > > + > > > > + /* wait for idle state. > > > Standard multiline comment syntax preferred. > > > > > > > > > > > > + * ADC needs to finish the current conversion > > > > + * before disabling the module > > > > + */ > > > > + do { > > > > + adc_fsm = tiadc_readl(adc_dev, > > > > REG_ADCFSM); > > > > + } while (adc_fsm != 0x10 && count++ < 100); > > > What slightly bothers me is that we are papering over the > > > problem, yet > > > this is far from guaranteed to fix it (I think!) as we might get > > > a > > > race > > > as it starts the next conversion after we have checked, but > > > before we > > > disable... > > > > > > Note I'm only looking at this fairly superficially so please > > > point > > > out if I have missed something! > > Probably my comment is a bit confusing. The ADC has already been > > disabled by writing the config some lines above the while loop. > > > > " > > config &= ~(CNTRLREG_TSCSSENB); > > tiadc_writel(adc_dev, REG_CTRL, config); > > " > > But as stated in the Reference Manual the ADC needs some time to > > finish > > the current conversion to finally disable the ADC. > > > > Description of "Enable" Bit in the CTRL Register (table 12-14): > > "TSC_ADC_SS module enable bit. > > After programming all the steps and configuration registers, write > > a > > 1to this bit to turn on TSC_ADC_SS. > > Writing a 0 will disable the module (after the current > > conversion)." > > > > That means once the adc_fsm is equal to 0x10 (idle state) the ADC > > stays > > disabled as long as we enable the ADC again. Which is done by the > > original code one line below of this patch. > Cool. Thanks for the clarification. I got lost in the datahsheet ;) > > Would you mind figuring out a fixes tag for this one as I want it to > go for stable? >From 93ed6e1d4cfd34f30e0201407c92ff8af129a689 Mon Sep 17 00:00:00 2001 From: Michael Engl <michael.engl@xxxxxxxxxxxxxxxxx> Date: Wed, 15 Mar 2017 19:45:39 +0100 Subject: [PATCH] iio: adc: ti_am335x: fix fifo overrun recovery The tiadc_irq_h(int irq, void *private) function is handling FIFO overruns by clearing flags, disabling and enabling the ADC to recover. If the ADC is running in continuous mode a FIFO overrun happens regulaly. If the disabling of the ADC happens concurrently with a new conversion. It might happen that the enabling of the ADC is ignored by the hardware. This stops the ADC permanently. No more interrupt are triggered. According to the AM335x Reference Manual (SPRUH73H October 2011 - Revised April 2013 Chapter 12.4 and 12.5) it is necessary to check the ADC FSM bits in REG_ADCFSM before enabling the ADC again. Because the disabling of the ADC is done right after the current conversion has been finished. To trigger this bug it is necessary to run the ADC in continuous mode. The ADC values of all channels need to be read in an endless loop. The bug appears within the first 6 hours (~5.4 million handled FIFO overruns). The user space application will hang on reading new values from the char device. Fixes: ca9a563805f7a ("iio: ti_am335x_adc: Add continuous sampling support") Signed-off-by: Michael Engl <michael.engl@xxxxxxxxxxxxxxxxx> --- drivers/iio/adc/ti_am335x_adc.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/iio/adc/ti_am335x_adc.c b/drivers/iio/adc/ti_am335x_adc.c index ad9dec30bb30..752e92ffce94 100644 --- a/drivers/iio/adc/ti_am335x_adc.c +++ b/drivers/iio/adc/ti_am335x_adc.c @@ -169,7 +169,9 @@ static irqreturn_t tiadc_irq_h(int irq, void *private) { struct iio_dev *indio_dev = private; struct tiadc_device *adc_dev = iio_priv(indio_dev); - unsigned int status, config; + unsigned int status, config, adc_fsm; + unsigned short count = 0; + status = tiadc_readl(adc_dev, REG_IRQSTATUS); /* @@ -183,6 +185,15 @@ static irqreturn_t tiadc_irq_h(int irq, void *private) tiadc_writel(adc_dev, REG_CTRL, config); tiadc_writel(adc_dev, REG_IRQSTATUS, IRQENB_FIFO1OVRRUN | IRQENB_FIFO1UNDRFLW | IRQENB_FIFO1THRES); + + /* wait for idle state. + ADC needs to finish the current conversion + before disabling the module + */ + do { + adc_fsm = tiadc_readl(adc_dev, REG_ADCFSM); + } while (adc_fsm != 0x10 && count++ < 100); + tiadc_writel(adc_dev, REG_CTRL, (config | CNTRLREG_TSCSSENB)); return IRQ_HANDLED; } else if (status & IRQENB_FIFO1THRES) { -- 2.7.4 > > Thanks, > > Jonathan > > > > > > > > > > > > > We could look for a transition, which would potentially reduce > > > the > > > chance > > > of problems, but that's nasty as well and we'd need to disable > > > interrupts > > > probably to avoid any potential of a clash (for a non trivial > > > amount > > > of > > > time...) > > > > > > > > > > > > + > > > > tiadc_writel(adc_dev, REG_CTRL, (config | > > > > CNTRLREG_TSCSSENB)); > > > > return IRQ_HANDLED; > > > > } else if (status & IRQENB_FIFO1THRES) { ��.n��������+%������w��{.n�����{��(��)��jg��������ݢj����G�������j:+v���w�m������w�������h�����٥