Re: [PATCH] i2c: i2c-qcom-geni: Fix DMA transfer race

Doug Anderson <dianders@xxxxxxxxxxxx> · Wed, 22 Jul 2020 15:08:14 -0700

Hi,

On Tue, Jul 21, 2020 at 1:26 PM Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
>
> Hi,
>
> On Tue, Jul 21, 2020 at 11:55 AM Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote:
> >
> > Quoting Doug Anderson (2020-07-21 09:18:35)
> > > On Tue, Jul 21, 2020 at 12:08 AM Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote:
> > > >
> > > > Quoting Stephen Boyd (2020-07-20 22:59:14)
> > > > >
> > > > > I worry that we also need a dmb() here to make sure the dma buffer is
> > > > > properly mapped before this write to the device is attempted. But it may
> > > > > only matter to be before the I2C_READ.
> > > > >
> > > >
> > > > I'm suggesting this patch instead where we make geni_se_setup_m_cmd()
> > > > use a writel() so that it has the proper barrier semantics to wait for
> > > > the other memory writes that happened in program order before this point
> > > > to complete before the device is kicked to do a read or a write.
> > >
> > > Are you saying that dma_map_single() isn't guaranteed to have a
> > > barrier or something?  I tried to do some searching and found a thread
> > > [1] where someone tried to add a barrierless variant of them.  To me
> > > that means that the current APIs have barriers.
> > >
> > > ...or is there something else you're worried about?
> >
> > I'm not really thinking about dma_map_single() having a barrier or not.
> > The patch you mention is from 2010. Many things have changed in the last
> > decade. Does it have barrier semantics? The presence of a patch on the
> > mailing list doesn't mean much.
>
> Yes, it's pretty old, but if you follow the thread and look at the
> patch I'm fairly certain it's still relevant.  Specifically, following
> one thread of dma_map_single() on arm64:
>
> dma_map_single()
> -> dma_map_single_attrs()
> --> dma_map_page_attrs()
> ---> dma_direct_map_page()
> ----> arch_sync_dma_for_device()
> -----> __dma_map_area()
> ------> __dma_inv_area() which has a "dsb"
>
> I'm sure there are lots of other possible paths, but one thing pointed
> out by following that path is 'DMA_ATTR_SKIP_CPU_SYNC'.  The
> documentation of that option talks about the normal flow.  It says
> that in the normal flow that dma_map_{single,page,sg} will
> synchronize.  We are in the normal flow here.
>
> As far as I understand, the whole point of dma_map_single() is to take
> a given buffer and get it all ready so that if a device does DMA on it
> right after the function exits that it's all set.
>
>
> > Specifically I'm looking at "KERNEL I/O BARRIER EFFECTS" of
> > Documentation/memory-barriers.txt and noticing that this driver is using
> > relaxed IO accessors meaning that the reads and writes aren't ordered
> > with respect to other memory accesses. They're only ordered to
> > themselves within the same device. I'm concerned that the CPU will issue
> > the IO access to start the write DMA operation before the buffer is
> > copied over due to out of order execution.
>
> I'm not an expert either, but it really looks like dma_map_single()
> does all that we need it to.
>
>
> > I'm not an expert in this area, but this is why we ask driver authors to
> > use the non-relaxed accessors because they have the appropriate
> > semantics built in to make them easy to reason about. They do what they
> > say when they say to do it.
>
> I'm all for avoiding using the relaxed variants too except if it's
> been shown to be a performance problem.  The one hesitation I have,
> though, is that I've spent time poking a bunch at the geni SPI driver.
> We do _a lot_ of very small SPI transfers on our system.  For each of
> these it's gotta setup a lot of commands.  When I was poking I
> definitely noticed the difference between writel() and
> writel_relaxed().  If we can save a few microseconds on each one of
> these transfers it's probably worth it since it's effectively in the
> inner loop of some transfers.
>
> One option I thought of was to track the mode (DMA vs. FIFO) and only
> do writel() for DMA mode.  If you're not convinced by my arguments
> about dma_map_single(), would you be good with just doing the
> non-relaxed version if we're in DMA mode?

OK, so I did some quick benchmarking and I couldn't find any
performance regression with just always using writel() here.  Even if
dma_map_single() does guarantee that things are synced:

* There's no guarantee that all geni users will use dma_map_{xxx}.

* As Stephen says, the writel() is easier to reason about.

The change to a writel() is a bit orthogonal to the issue being
discussed here, though and it wouldn't make sense to have one patch
touch both the geni headers and also the i2c code.  Thus, I have sent
v2 without it (just with the other fixes that Stephen requested) and
also sent out a separate patch to change from writel_relaxed() to
writel().

Breadcrumbs:

[PATCH v2] i2c: i2c-qcom-geni: Fix DMA transfer race
https://lore.kernel.org/r/20200722145948.v2.1.I7efdf6efaa6edadbb690196cd4fbe3392a582c89@changeid/

[PATCH] soc: qcom-geni-se: Don't use relaxed writes when writing commands
https://lore.kernel.org/r/20200722150113.1.Ia50ab5cb8a6d3a73d302e6bdc25542d48ffd27f4@changeid/

As mentioned after the cut in the i2c change, I have kept people's
tested/reviewed tags for v2.

-Doug