Re: [PATCH 07/10] crypto: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Sun, 2 Oct 2022 10:00:12 -0700

On Sat, Oct 1, 2022 at 3:30 PM Catalin Marinas <catalin.marinas@xxxxxxx> wrote:
>
> The "force bouncing" in my series currently only checks for small
> (potentially kmalloc'ed) sizes under the assumption that intra-object
> DMA buffers were properly aligned to 128. So for something like below:

Ahh, so your forced bouncing isn't actually safe.

I would have hoped (but obviously never checked) that the force
bouncing be made really safe and look at the actual alignment of the
DMA (comparing it to the hardware coherency requirements), so that
alignment at allocation time simply wouldn't matter.

At that point, places like the ones you found would still work, they'd
just cause bouncing.

At which point you'd then have a choice of

 (a) just let it bounce

 (b) marking the allocations that led to them

and (a) might actually be perfectly fine in a lot of situations.
That's particularly true for the "random drivers" situation that may
not be all that relevant in real life, which is a *big* deal. Not
because of any performance issues, but simply because of kernel
developers not having to worry their pretty little heads about stuff
that doesn't really matter.

In fact, (a) might be perfectly ok even for drivers that *do* matter,
if they just aren't all that performance-critical and the situation
doesn't come up a lot (maybe it's a special management ioctl or
similar that just causes the possibility to come up, and it's
important that it *works*, but having a few bounces occasionally
doesn't actually matter, and all the regular IO goes the normal path).

And (b) would be triggered by actual data. Which could be fairly easy
to gather with a statistical model. For example, just making
dma_map_xyz() have a debug mode where it prints out the stack trace of
these bounces once every minute or so - statistically the call trace
will be one of the hot ones. Or, better yet, just use tracing to do
it.

That would allow us to say "DMA is immaterial for _correct_ alignment,
because we always fix it up if required", but then also find
situations where we might want to give it a gentle helper nudge.

But hey, if you're comfortable with your approach, that's fine too.
Anything that gets rid of the absolutely insane "you can't do small
allocations" is an improvement.

                   Linus