On 5/30/2019 11:08 AM, Ard Biesheuvel wrote: > On Thu, 30 May 2019 at 09:46, Horia Geanta <horia.geanta@xxxxxxx> wrote: >> >> On 5/30/2019 8:34 AM, Herbert Xu wrote: >>> On Wed, May 29, 2019 at 01:27:28PM -0700, Eric Biggers wrote: >>>> >>>> So what about the other places that also pass an IV located next to the data, >>>> like crypto/ccm.c and crypto/adiantum.c? If we're actually going to make this a >> Fix for ccm is WIP. >> We were not aware of adiantum since our crypto engine does not accelerate it. >> >>>> new API requirement, then we need to add a debugging option that makes the API >>>> detect this violation so that the other places can be fixed too. >>>> >> IMO this is not a new crypto API requirement. >> crypto API and its users must follow DMA API rules, besides crypto-specific ones. >> >> In this particular case, crypto/gcm.c is both an implementation and a crypto API >> user, since it uses underneath ctr(aes) (and ghash). >> Currently generic gcm implementation is breaking DMA API, since part of the dst >> buffer (auth_tag) provided to ctr(aes) is sharing a cache line with some other >> data structure (iv). >> >> The DMA API rule is mentioned in Documentation/DMA-API.txt >> >> .. warning:: >> >> Memory coherency operates at a granularity called the cache >> line width. In order for memory mapped by this API to operate >> correctly, the mapped region must begin exactly on a cache line >> boundary and end exactly on one (to prevent two separately mapped >> regions from sharing a single cache line). >> >> > > This is overly restrictive, and not in line with reality. The whole > networking stack operates on buffers shifted by 2 bytes if > NET_IP_ALIGN is left at its default value of 2. There are numerous > examples in other places as well. > > Given that kmalloc() will take the cacheline granularity into account > if necessary, the only way this issue can hit is when a single kmalloc > buffer is written to by two different masters. > I guess there are only two options: -either cache line sharing is avoided OR -users need to be *aware* they are sharing the cache line and some rules / assumptions are in place on how to safely work on the data What you are probably saying is that 2nd option is sometimes the way to go. >>>> Also, did you consider whether there's any way to make the crypto API handle >>>> this automatically, so that all the individual users don't have to? >> That would probably work, but I guess it would come up with a big overhead. >> >> I am thinking crypto API would have to check each buffer used by src, dst >> scatterlists is correctly aligned - starting and ending on cache line boundaries. >> >>> >>> You're absolutely right Eric. >>> >>> What I suggested in the old thread is non-sense. While you can >>> force GCM to provide the right pointers you cannot force all the >>> other crypto API users to do this. >>> >> Whose problem is that crypto API users don't follow the DMA API requirements? >> >>> It would appear that Ard's latest suggestion should fix the problem >>> and is the correct approach. >>> >> I disagree. >> We shouldn't force crypto implementations to be aware of such inconsistencies in >> the I/O data buffers (pointed to by src/dst scatterlists) that are supposed to >> be safely DMA mapped. >> > > I'm on the fence here. On the one hand, it is slightly dodgy for the > GCM driver to pass a scatterlist referencing a buffer that shares a > cacheline with another buffer passed by an ordinary pointer, and for > which an explicit requirement exists that the callee should update it > before returning. > > On the other hand, I think it is reasonable to require drivers not to > perform such updates while the scatterlist is mapped for DMA, since > fixing it in the callers puts a disproportionate burden on them, given > that non-coherent DMA only represents a small minority of use cases. > The problem with this approach is that the buffers in the scatterlist could hypothetically share cache lines with *any* other CPU-updated data, not just the IV in the crypto request (as it happens here). How could a non-coherent DMA implementation cope with this? Thanks, Horia