On Wed, 17 Apr 2019 at 20:16, Pascal Van Leeuwen <pvanleeuwen@xxxxxxxxxxxxxxxx> wrote: > > > -----Original Message----- > > From: Ard Biesheuvel [mailto:ard.biesheuvel@xxxxxxxxxx] > > Sent: Wednesday, April 17, 2019 11:43 PM > > To: Pascal Van Leeuwen <pvanleeuwen@xxxxxxxxxxxxxxxx> > > Cc: Eric Biggers <ebiggers@xxxxxxxxxx>; linux-crypto@xxxxxxxxxxxxxxx; > > Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> > > Subject: Re: Question regarding crypto scatterlists / testmgr > > > > On Wed, 17 Apr 2019 at 14:17, Pascal Van Leeuwen > > <pvanleeuwen@xxxxxxxxxxxxxxxx> wrote: > > > > > > > -----Original Message----- > > > > From: Eric Biggers [mailto:ebiggers@xxxxxxxxxx] > > > > Sent: Wednesday, April 17, 2019 10:24 PM > > > > To: Pascal Van Leeuwen <pvanleeuwen@xxxxxxxxxxxxxxxx> > > > > Cc: linux-crypto@xxxxxxxxxxxxxxx; Herbert Xu > > > > <herbert@xxxxxxxxxxxxxxxxxxx> > > > > Subject: Re: Question regarding crypto scatterlists / testmgr > > > > > > > > Hi Pascal, > > > > > > > > On Wed, Apr 17, 2019 at 07:51:08PM +0000, Pascal Van Leeuwen wrote: > > > > > Hi, > > > > > > > > > > I'm trying to fix the inside-secure driver to pass all testmgr > > > > > tests and I have one final issue remaining with the AEAD ciphers. > > > > > As it was not clear at all what the exact problem was, I spent > > > > > some time reverse engineering testmgr and I got the distinct > > > > > impression that it is using scatter particles that cross page > > > > > boundaries. On purpose, even. > > > > > > > > > > While the inside-secure driver is built on the premise that > > > > > scatter particles are continuous in device space. As I can't > > > > > think of any reason why you would want to scatter/gather other > > > > > than to handle virtual-to-physical address translation ... > > > > > In any case, this should affect all other other operations as > > > > > well, but maybe those just got "lucky" by getting particles > > > > > that were still contiguous in device space, despite the page > > > > > crossing (to *really* verify this, you would have to fully > > > > > randomize your page allocation!) > > > > > > > > > > Anyway, assuming that I *should* be able to handle particles > > > > > that are *not* contiguous in device space, then there should > > > > > probably already exist some function in the kernel API that > > > > > converts a scatterlist with non-contiguous particles into a > > > > > scatterlist with contiguous particles, taking into account the > > > > > presence of an IOMMU? Considering pretty much every device > > > > > driver would need to do that? > > > > > Does anyone know which function(s) to use for that? > > > > > > > > > > Regards, > > > > > Pascal van Leeuwen > > > > > Silicon IP Architect, Multi-Protocol Engines @ Inside Secure > > > > > > > > > > > > > Indeed, since v5.1, testmgr tests scatterlist elements that cross a > > > > page. > > > > However, the pages are guaranteed to be *physically* contiguous. > > Does > > > > dma_map_sg() not handle this? > > > > > > > I'm not entirely sure and the API documentation is not particularly > > > clear on *what* dma_map_sg() actually does, but I highly doubt it > > > considering the particle count is only an input parameter (i.e. it > > > can't output an increase in particles that would be required). > > > So I think it just ensures the pages are actually flushed to memory > > > and accessible by the device (in case an IOMMU interferes) and not > > > much than that. > > > > > > In any case, scatter particles to be used by hardware should *not* > > > cross any physical page boundaries. > > > But also see the thread I had on this with Ard - seems like the > > crypto > > > API already has some mechanism for enforcing this but it's not > > enabled > > > for AEAD ciphers? > > > > > > > It has simply never been implemented because nobody had a need for it. > > > > > > > > > > BTW, this isn't just a theoretical case. Many crypto API users do > > > > crypto on > > > > kmalloced buffers, and those can cross a page boundary, especially > > if > > > > they are > > > > large. All software crypto algorithms handle this case. > > > > > > > Software sits behind the CPU's MMU and sees virtual memory as > > > contiguous. It does not need to "handle" anything, it gets it for > > free. > > > Hardware does not have that luxury, unless you have a functioning > > IOMMU > > > but that is still pretty rare. > > > So for hardware, you need to break down your buffers until individual > > > pages and stitch those together. That's the main use case of a > > scatter > > > list and it requires the particles to NOT cross physical pages. > > > > > > > kmalloc() is guaranteed to return physically contiguous memory, but > > assuming that this results in contiguous DMA memory requires the DMA > > map call to cover the whole thing, or the IOMMU may end up mapping it > > in some other way. > > > > The safe approach (which the async walk seems to take) is just to > > carve up each scatterlist entry so it does not cross any page > > boundaries, and return it as discrete steps in the walk. > > > That's interesting. Is that actually true though or just assumption? > If the pages are guaranteed to be contiguous, then why break up the > scatter chain further into individual pages? > For our hardware, the number of particles may become a performance > bottleneck, so the less particles the better. Also, the work to walk > the chain and break it up would take up precious CPU cycles. > Seems like I was misreading the code: we have the following code in skcipher_walk_next if (!err && (walk->flags & SKCIPHER_WALK_PHYS)) { walk->src.phys.page = virt_to_page(walk->src.virt.addr); walk->dst.phys.page = virt_to_page(walk->dst.virt.addr); walk->src.phys.offset &= PAGE_SIZE - 1; walk->dst.phys.offset &= PAGE_SIZE - 1; } but all that does is normalize the offset. In fact, this code looks slightly dodgy to me, given that, if the offset /does/ exceed PAGE_SIZE, it normalizes the offset but does not advance the page pointers accordingly. The thing to be aware of is that struct pages are not guaranteed to be mapped on the CPU, and so a lot of the virt handling deals with mapping/unmapping on the *cpu* side rather than the device side. So a phys walk gives you each physically contiguous entry in turn, and it is up to the device driver to map it for DMA if needed. To satisfy my curiosity, I looked at the existing async drivers, and very few actually appear to be using any of this stuff. So perhaps my attempt to clarify things ended up achieving the opposite, and we are really only interested in whether dma_map_sg() does what you expect in your driver. > > > > > > The fact that these types of issues are just being considered now > > > > certainly > > > > isn't raising my confidence in the hardware crypto drivers in the > > > > kernel... > > > > > > > Actually, this is *not* a problem with the hardware drivers. It's a > > > problem with the API and/or how you are trying to use it. Hardware > > > does NOT see the nice contiguous virtual memory that SW sees. > > > > > > If the driver may expect to receive particles that cross page > > > boundaries - if that's the spec - fine, but then it will have to > > > break those down into individual pages by itself. However, whomever > > > created the inside-secure driver was under the impression that this > > > was not supposed to be the case. And I don't know who's right or > > > wrong there, but from a side discussion with Ard I got the impression > > > that the Crypto API should fix this up before it reaches the driver. > > > > > > > To be clear, is that driver upstream? And if so, where does it reside? > > > FYI: the original driver I started with is upstream: > drivers/crypto/inside-secure > OK, so indeed, you are using dma_map_sg(), which seems absolutely fine if your hardware supports that model. So apologies for the noise ...