Re: x86 CPU features detection for applications (and AMX)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 30.06.21 17:36, Thiago Macieira wrote:

Hi,

Does anyone here know why they designed this as inline operations ? This
thing seems to be pretty much what typical TPUs are doing (or a subset
of it). Why not just adding a TPU next to the CPU on the same chip ?

To be clear: this is a SW ABI. It has nothing to do the presence or absence of
other processing units in the system.

Well, if I'm correct, it's needed because there is some additional unit
whose state need to be saved. And that again necessary, because this
unit is controlled directly by the usual CPU instruction stream (in
contrast to separately programmed devices like a gpu, sdma, etc).

The moment you receive a Unix signal with SA_SIGINFO, the mcontext state needs
to be saved somewhere. Where would you save it? Please remember that:

- signal handlers can be called at any point in the execution, including
   in the middle of malloc()
- signal handlers can longjmp out of the handler back into non-handler code
- in a multithreaded application, each thread can be handling a signal
   simultaneously

Yes, the last part seems to be the most tricky point.

If we were only talking about kernel controlled context switches (task
switches) and sighandler always return to the kernel, then the kernel
could handle that all internally, w/o userland never knowing it. But
unfortunately that's not the case :(

Userspace will have to do something like:
   - check CPUID, if !AMX -> fail
   - issue prctl(), if error -> fail
   - issue XGETBV and check the AMX bit it set, if not -> fail

Can't we to this just by prctl() call ?
IOW: ask the kernel, who gonna say yes or no.

That's possible. The kernel can't enable an AMX state on a system without AMX.

Good, that could at least make the API somewhat simpler.

   - request the signal stack size / spawn threads

Signal stack is separate from the usual stack, right ?
Why can't this all be done in one shot ?

Yes, we're talking about the sigaltstack() call.

What is "this all" in the sentence above?

Taking care of big large enough signal stack along with enabling AMX in
one shot. This might not support all kind of uses of sigaltstack(), but
do really need to support that all ?

IMHO, the whole AMX issue is just for *new* software (and I haven't seen
practical use of alternative sighandler stack for aeons), so it's not
about compatibility to existing software. Theoretically we could declare
the combination AMX and sigaltstack() just isn't supported. (Of course,
some combinations of using old libraries might break - but even if old
library code is reused, it's still new software).

Maybe not a completely satisfying idea, but perhaps something that's
much easier to achieve and gets the actual problem solved.


--mtx

--
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
info@xxxxxxxxx -- +49-151-27565287



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux