Re: [PATCH 0/2] serial: sh-sci: Fix .flush_buffer() issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello All,

On Fri, Jun 28, 2019 at 01:51:25PM +0200, Geert Uytterhoeven wrote:
> Hi Eugeniu,
> 
> On Wed, Jun 26, 2019 at 7:34 PM Eugeniu Rosca <roscaeugeniu@xxxxxxxxx> wrote:
> > On Mon, Jun 24, 2019 at 02:35:38PM +0200, Geert Uytterhoeven wrote:
> > > This patch series attempts to fix the issues Eugeniu Rosca reported
> > > seeing, where .flush_buffer() interfered with transmit DMA operation[*].
> > >
> > > There's a third patch "dmaengine: rcar-dmac: Reject zero-length slave
> > > DMA requests", which is related to the issue, but further independent,
> > > hence submitted separately.
> > >
> > > Eugeniu: does this fix the issues you were seeing?
> >
> > Many thanks for both sh-sci and the rcar-dmac patches.
> > The fixes are very much appreciated.
> >
> > > Geert Uytterhoeven (2):
> > >   serial: sh-sci: Fix TX DMA buffer flushing and workqueue races
> > >   serial: sh-sci: Terminate TX DMA during buffer flushing
> > >
> > >  drivers/tty/serial/sh-sci.c | 33 ++++++++++++++++++++++++---------
> > >  1 file changed, 24 insertions(+), 9 deletions(-)
> >
> > I reserved some time to get a feeling about how the patches behave on
> > a real system (H3-ES2.0-ULCB-KF-M06), so here come my observations.
> 
> Thanks for your extensive testing!
> 
> > First of all, the issue I have originally reported in [0] is only
> > reproducible in absence of [4]. So, one of my questions would be how
> > do you yourself see the relationship between [1-3] and [4]?
> 
> I consider them independent.
> Just applying [4] would fix the issue for the console only, while the
> race condition can still be triggered on other serial ports.
> 
> > That said, all my testing assumes:
> >  - Vanilla tip v5.2-rc6-15-g249155c20f9b with [4] reverted.
> >  - DEBUG is undefined in {sh-sci.c,rcar-dmac.c}, since I've noticed
> >    new issues arising in the debug build, which are unrelated to [0].
> >
> > Below is the summary of my findings:
> >
> >  Version         IS [0]       Is console       Error message when
> > (vanilla+X)    reproduced?  usable after [0]   [0] is reproduced
> >                              is reproduced?
> >  ------------------------------------------------------------
> >  -[4]             Yes           No                [5]
> >  -[4]+[1]         Yes           No                -
> >  -[4]+[2]         Yes           Yes               [5]
> >  -[4]+[3]         Yes           Yes               [6]
> >  -[4]+[1]+[2]     No            -                 -
> >  -[4]+[1]+[2]+[3] No            -                 -
> >  pure vanilla     No            -                 -
> >
> > This looks a little too verbose, but I thought it might be interesting.
> 
> Thanks, it's very helpful to provide these results.
> 
> > The story which I see is that [1] does not fix [0] alone, but it seems
> > to depend on [2]. Furthermore, if cherry picked alone, [1] makes the
> > matters somewhat worse in the sense that it hides the error [5].
> 
> OK.
> 
> > My only question is whether [1-3] are supposed to replace [4] or they
> > are supposed to happily coexist. Since I don't see [0] being reproduced
> 
> They are meant to coexist.
> 
> > with [1-3], I personally prefer to re-enable DMA on SCIF (when the
> > latter is used as console) so that more features and code paths are
> > exercised to increase test coverage.
> 
> If a serial port is used as a console, the port is used for both DMA
> (normal use) and PIO (serial console output).  The latter can have a
> negative impact on the former, aggravating existing bugs, or triggering
> more races, even in the hardware.  So I think it's better to be more
> cautious and keep DMA disabled for the console.

Agreed.

Just a note for the record that [4] was the easiest way to resolve the
reported problem [0] but an alternative solution would be to implement DMA
support for ttySC console ports which will be non-trivial to implement and test
due to the potential for deadlocks in console write critical paths where
various locks are held with interrupts disabled. I see only one tty serial
driver which implements console DMA support, drivers/tty/serial/mpsc.c,
and perhaps there is a good reason why there are no other examples?

> > [0] https://lore.kernel.org/lkml/20190504004258.23574-3-erosca@xxxxxxxxxxxxxx/
> > [1] https://patchwork.kernel.org/patch/11012983/
> >     ("serial: sh-sci: Fix TX DMA buffer flushing and workqueue races")
> > [2] https://patchwork.kernel.org/patch/11012987/
> >     ("serial: sh-sci: Terminate TX DMA during buffer flushing")
> > [3] https://patchwork.kernel.org/patch/11012991/
> >     ("dmaengine: rcar-dmac: Reject zero-length slave DMA requests")
> > [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=099506cbbc79c0
> >     ("serial: sh-sci: disable DMA for uart_console")
> >
> > [5] rcar-dmac e7300000.dma-controller: Channel Address Error
> > [6] rcar-dmac e7300000.dma-controller: rcar_dmac_prep_slave_sg: bad parameter: len=1, id=19
> >     sh-sci e6e88000.serial: Failed preparing Tx DMA descriptor
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> -- 
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

-- 
Regards,
George



[Index of Archives]     [Linux Kernel]     [Linux ARM (vger)]     [Linux ARM MSM]     [Linux Omap]     [Linux Arm]     [Linux Tegra]     [Fedora ARM]     [Linux for Samsung SOC]     [eCos]     [Linux PCI]     [Linux Fastboot]     [Gcc Help]     [Git]     [DCCP]     [IETF Announce]     [Security]     [Linux MIPS]     [Yosemite Campsites]

  Powered by Linux