Re: [BUG] More issues with arm/aes-neonbs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 8 Aug 2024 at 10:14, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> (Please note: ENTIRELY UNTESTED! It compiles for me, but I might have
> done something incredibly stupid and maybe there's some silly and
> fatal bug in what _appears_ trivially correct).

Ok, I fixed the stupid timeout check condition, and I actually ended
up testing this on my system by making the timeout be just 10ms
instead of ten seconds.

With that, I get a nice

    module 'hid-logitech-dj.ko' taking a long time to load

message about the "problem", so at least the warning seems to work.

I've committed that (with the timeout for the warning set back to
10s), not because it *fixes* anything, but because the warning itself
is hopefully useful to avoid having to debug issues like this in the
future, and because it also re-organizes the code so that any possible
"break dependency on recursion detection" thing would be easier to
deal with.

That said, the *proper* fix really is to make sure that a module
doesn't recursively try to load itself.

Because the thing that happened to make it work before the "wait until
the previous module has completely finished loading" was that we
*used* to do "wait until the both module has _almost_ completely
finished loading, then return an error if there was another copy of
it".

And it turns out that the "return an error if there's a concurrent
load" is fundmanetally racy if the concurrent loaders are actually
concurrent (not a serial recursion).

The race is typically *small*, but when I made it bigger in commit
9828ed3f695a, it actually ended up triggering in real life for modules
that had dependencies and returning an error before the module had
finished would then cause cascading errors. So the race does exist in
real life.

In the case of the actual recursive invocation, that race isn't an
issue - it's all serial - and returning an error before fully
initializing the first module is actually what you want (since it
won't _become_ fully initialized if you wait for it).

But basically the old recursion avoidance was not really reliable in
other situations, which is why we don't want to re-introduce the
"error out early" behavior.

I don't know the crypto registration API enough to even guess at what
the fix to break the recursion is.

Herbert?

              Linus




[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]
  Powered by Linux