Hi, On Tue, Jul 21, 2020 at 1:26 PM Doug Anderson <dianders@xxxxxxxxxxxx> wrote: > > Hi, > > On Tue, Jul 21, 2020 at 11:55 AM Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote: > > > > Quoting Doug Anderson (2020-07-21 09:18:35) > > > On Tue, Jul 21, 2020 at 12:08 AM Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote: > > > > > > > > Quoting Stephen Boyd (2020-07-20 22:59:14) > > > > > > > > > > I worry that we also need a dmb() here to make sure the dma buffer is > > > > > properly mapped before this write to the device is attempted. But it may > > > > > only matter to be before the I2C_READ. > > > > > > > > > > > > > I'm suggesting this patch instead where we make geni_se_setup_m_cmd() > > > > use a writel() so that it has the proper barrier semantics to wait for > > > > the other memory writes that happened in program order before this point > > > > to complete before the device is kicked to do a read or a write. > > > > > > Are you saying that dma_map_single() isn't guaranteed to have a > > > barrier or something? I tried to do some searching and found a thread > > > [1] where someone tried to add a barrierless variant of them. To me > > > that means that the current APIs have barriers. > > > > > > ...or is there something else you're worried about? > > > > I'm not really thinking about dma_map_single() having a barrier or not. > > The patch you mention is from 2010. Many things have changed in the last > > decade. Does it have barrier semantics? The presence of a patch on the > > mailing list doesn't mean much. > > Yes, it's pretty old, but if you follow the thread and look at the > patch I'm fairly certain it's still relevant. Specifically, following > one thread of dma_map_single() on arm64: > > dma_map_single() > -> dma_map_single_attrs() > --> dma_map_page_attrs() > ---> dma_direct_map_page() > ----> arch_sync_dma_for_device() > -----> __dma_map_area() > ------> __dma_inv_area() which has a "dsb" > > I'm sure there are lots of other possible paths, but one thing pointed > out by following that path is 'DMA_ATTR_SKIP_CPU_SYNC'. The > documentation of that option talks about the normal flow. It says > that in the normal flow that dma_map_{single,page,sg} will > synchronize. We are in the normal flow here. > > As far as I understand, the whole point of dma_map_single() is to take > a given buffer and get it all ready so that if a device does DMA on it > right after the function exits that it's all set. > > > > Specifically I'm looking at "KERNEL I/O BARRIER EFFECTS" of > > Documentation/memory-barriers.txt and noticing that this driver is using > > relaxed IO accessors meaning that the reads and writes aren't ordered > > with respect to other memory accesses. They're only ordered to > > themselves within the same device. I'm concerned that the CPU will issue > > the IO access to start the write DMA operation before the buffer is > > copied over due to out of order execution. > > I'm not an expert either, but it really looks like dma_map_single() > does all that we need it to. > > > > I'm not an expert in this area, but this is why we ask driver authors to > > use the non-relaxed accessors because they have the appropriate > > semantics built in to make them easy to reason about. They do what they > > say when they say to do it. > > I'm all for avoiding using the relaxed variants too except if it's > been shown to be a performance problem. The one hesitation I have, > though, is that I've spent time poking a bunch at the geni SPI driver. > We do _a lot_ of very small SPI transfers on our system. For each of > these it's gotta setup a lot of commands. When I was poking I > definitely noticed the difference between writel() and > writel_relaxed(). If we can save a few microseconds on each one of > these transfers it's probably worth it since it's effectively in the > inner loop of some transfers. > > One option I thought of was to track the mode (DMA vs. FIFO) and only > do writel() for DMA mode. If you're not convinced by my arguments > about dma_map_single(), would you be good with just doing the > non-relaxed version if we're in DMA mode? OK, so I did some quick benchmarking and I couldn't find any performance regression with just always using writel() here. Even if dma_map_single() does guarantee that things are synced: * There's no guarantee that all geni users will use dma_map_{xxx}. * As Stephen says, the writel() is easier to reason about. The change to a writel() is a bit orthogonal to the issue being discussed here, though and it wouldn't make sense to have one patch touch both the geni headers and also the i2c code. Thus, I have sent v2 without it (just with the other fixes that Stephen requested) and also sent out a separate patch to change from writel_relaxed() to writel(). Breadcrumbs: [PATCH v2] i2c: i2c-qcom-geni: Fix DMA transfer race https://lore.kernel.org/r/20200722145948.v2.1.I7efdf6efaa6edadbb690196cd4fbe3392a582c89@changeid/ [PATCH] soc: qcom-geni-se: Don't use relaxed writes when writing commands https://lore.kernel.org/r/20200722150113.1.Ia50ab5cb8a6d3a73d302e6bdc25542d48ffd27f4@changeid/ As mentioned after the cut in the i2c change, I have kept people's tested/reviewed tags for v2. -Doug