Re: [PATCH v8 0/7] Optimize dm-verity and fsverity using multibuffer hashing

Ard Biesheuvel <ardb@xxxxxxxxxx> · Thu, 13 Feb 2025 11:10:10 +0100

On Thu, 13 Feb 2025 at 05:17, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, Feb 12, 2025 at 07:47:11AM -0800, Eric Biggers wrote:
> > [ This patchset keeps getting rejected by Herbert, who prefers a
> >   complex, buggy, and slow alternative that shoehorns CPU-based hashing
> >   into the asynchronous hash API which is designed for off-CPU offload:
> >   https://lore.kernel.org/linux-crypto/cover.1730021644.git.herbert@xxxxxxxxxxxxxxxxxxx/
> >   This patchset is a much better way to do it though, and I've already
> >   been maintaining it downstream as it would not be reasonable to go the
> >   asynchronous hash route instead.  Let me know if there are any
> >   objections to me taking this patchset through the fsverity tree, or at
> >   least patches 1-5 as the dm-verity patches could go in separately. ]
>
> Yes I object.  While I very much like this idea of parallel hashing
> that you're introducing, shoehorning it into shash is restricting
> this to storage-based users.
>
> Networking is equally able to benefit from paralell hashing, and
> parallel crypto (in particular, AEAD) in general.  In fact, both
> TLS and IPsec can benefit directly from bulk submission instead
> of the current scheme where a single packet is processed at a time.
>
> But thanks for the reminder and I will be posting my patches
> soon.
>

I have to second Eric here, simply because his work has been ready to
go for a year now, while you keep rejecting it on the basis that
you're creating something better, and the only thing you have managed
to produce in the meantime didn't even work.

I strongly urge you to accept Eric's work, and if your approach is
really superior, it should be fairly easy making that point with
working code once you get around to producing it, and we can switch
over the users then.

The increased flexibility you claim your approach will have does not
mesh with my understanding of where the opportunities for improvement
are: CPU-based SHA can be tightly interleaved at the instruction level
to have a performance gain of almost 2x. Designing a more flexible
ahash based multibuffer API that can still take advantage of this to
the same extent is not straight-forward, and you going off and cooking
up something by yourself for months at a time does not inspire
confidence that this will converge any time soon, if at all.

Also, your network use case is fairly theoretical, whereas the
fsverity and dm-verity code runs on 100s of millions of mobile phones
in the field, so sacrificing any performance of the latter to serve
the former seems misguided to me.

So could you please remove yourself from the critical path here, and
merge this while we wait for your better alternative to materialize?

Thanks,
Ard.