Re: [REGRESSION] alg: ahash: Several tests fail during boot on Turris Omnia

Klaus Kudielka <klaus.kudielka@xxxxxxxxx> · Mon, 07 Oct 2024 22:57:00 +0200

On Mon, 2024-10-07 at 16:27 +0800, Herbert Xu wrote:
> 
> I see where the problem is.  Unfortunately this is not a regression,
> but instead we've managed to identify an existing bug.
> 
> The cesa driver is buggy when you invoke it in parallel.  This
> would've previously resulted in incorrect hashes being produced,
> which would not be easily discoverable (networking users would
> simply retry if they hit this, while storage probably doesn't
> use these algorithms at all).
> 
> What happened here is that the new async testing launches all
> built-in algorithm self-tests at the same time and in parallel.
> Previously self-tests of built-in algorithms were launched one-by-one
> so there is only ever one test in flight at any moment.
> 
> This causes the cesa driver to be invoked in parallel, thus
> triggering the buggy code where two hash requests would be submitted
> to the hardware at the same time.
> 

Thanks a lot for these insights.

In other words: The driver API explicitly allows parallel invocation,
but the driver lacks serialized access to its own hardware resources?

> So I think it's a good thing that the self-test has managed to
> discover this by itself and the result is also harmless, the buggy
> algorithms are disabled.
> 
> I'll try to fix this but it's going to take some effort and I'll need
> your help as I don't have the hardware myself.

I would be happy to support development of a fix, by testing on my
spare Omnia.

If the above is true, the only other option I see is to declare the
driver BROKEN, since existing CESA users are likely sitting on a time
bomb. Some file systems do use hash algorithms, as far as I know?

Thanks again, Klaus