On Tue, 9 Apr 2024 11:44:26 -0500 David Lechner <dlechner@xxxxxxxxxxxx> wrote: > On Tue, Apr 9, 2024 at 11:09 AM Marcelo Schmitt > <marcelo.schmitt1@xxxxxxxxx> wrote: > > > > On 04/08, David Lechner wrote: > > > On Mon, Apr 8, 2024 at 9:32 AM Marcelo Schmitt > > > <marcelo.schmitt@xxxxxxxxxx> wrote: > > > > > > ... > > > > > > > I also still have doubts about using IIO_BE and 8-bit xfers when it > > > comes to adding support later to achieve max sample rate with a SPI > > > offload. For example to get 2MSPS with an 18-bit chip, it will require > > > an approx 33% faster SPI clock than the actual slowest clock possible > > > because it will have to read 6 extra bits per sample. I didn't check > > > the specs, but this may not even be physically possible without > > > exceeding the datasheet max SPI clock rate. Also errors could be > > > reduced if we could actually use the slowest allowable SPI clock rate. > > > Furthermore, the offload hardware would have to be capable of adding > > > an extra byte per sample for 18 and 20-bit chips when piping the data > > > to DMA in order to get the 32-bit alignment in the buffer required by > > > IIO scan_type and the natural alignment requirements of IIO buffers in > > > general. > > > > Maybe I should already implement support for reading with SPI offload > > rather than doing it after this set is merged? > > So we can test what happens at faster sample rates before we commit to a solution. > > > > Yes, that sounds like a wise thing to do. > > > > > > > > > > + } data; > > > > + s64 timestamp __aligned(8); > > > > + } scan; > > > > + __be16 tx_buf __aligned(IIO_DMA_MINALIGN); > > > > + __be16 rx_buf; > > > > +}; > > > > > > scan.data is used as SPI rx_buf so __aligned(IIO_DMA_MINALIGN); needs > > > to be moved to the scan field. > > > > I have already tried it. Maybe I did something wrong besides buffer alignment > > at that time. Will give it another try. > > This is the alignment for DMA cache coherency. So it should not have > any affect on the actual data read, only performance. Nope. It's a correctness issue not a performance one (though you may get advantages there as well) You can get potential corruption of other fields that end up in the same cacheline - so the aim is to make sure that nothing that we might use concurrently is in that cacheline. There was a good description of what is going on here in a talk Wolfram gave a few years back when he was exploring how to avoid bounce buffers in I2C https://www.youtube.com/watch?v=JDwaMClvV-s that includes links to descriptions of the fun that can happen. The short description is that a DMA controller is allowed to grab the whole of a cacheline (typically 64 bytes, can be bigger) in coherently from the host (basically takes a copy). It can then merrily do it's operations before finally copying it back to the actual memory. The problem lies in that there may be other data in that cacheline that is accessed at whilst the DMA controller was working on it's own prviate copy. Those changes will be wiped out. Now you probably didn't see it because: a) Many controllers don't do this - either they don't cache stale data, or are sufficiently coherent with CPU caches etc that any changes in this 'near by' data are correctly handled. b) It's really hard to hit the races anyway. I've only seen it once when debugging real systems, but people run into this occasionally on IIO drivers and it is really nasty to debug. c) On arm64 at least in many cases the DMA core will bounce buffer anyway after some changes made a couple of years back. Unfortunately that isn't true on all architectures yet (I think anyway) so we still need to be careful around this. Note some architectures (x86 I think) never allowed this cacheline corruption path in the first place so the DMA_MINALIGN value is 8 bytes (IIRC). Coherency is hard (but fun if you have time, particularly the places where it is allowed to break and how they are avoided :) Jonathan