Re: Candidate Linux ABI for Intel AMX and hypothetical new related features

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/20/21 2:41 PM, Len Brown wrote:
> So the questions are:
> 1. who calls it -- a call/thread or process?  the application?  a
> library -- which library?
> 2. is it optional, or mandatory?
> 3. if it is mandatory, what is the best way to enforce it?
> 4. should we have a "release" system call too?
> 
> 1. Every thread needs a context switch buffer.  Does every thread make
> the system call?  It seems sort of awkward for a library to always
> make a system call before doing a TMUL.  It would be functionally
> harmless, but it would add latency to an otherwise low-latency
> operation.  If some central library does it, and caches that it has
> done it before, then it would be ugly, but at least it would remove an
> unnecessary user/kernel transition.

Our system calls are *REALLY* fast.  We can even do a vsyscall for this
if we want to get the overhead down near zero.  Userspace can also cache
the "I did the prctl()" state in thread-local storage if it wants to
avoid the syscall.

> 2. If it is optional, then v5 is code complete -- because it allows
> you to allocate either explicitly via prtcl, or transparently via #NM.

It needs to be mandatory.  If it's not, then nobody will use it, and
they'll suffer the dreaded SIGSEGV-on-vmalloc()-failure and start filing
bug reports.

> 3. If it is mandatory, then we should re-purpose the XFD mechanism:
> app starts with XFD armed, by default
> if app touches AMX before prctl, it takes a signal (and dies).
> When app calls prctl, allocate buffer disarm XFD for that app (exactly
> what #NM trap does today).

Yes, that sounds like a good use of XFD.

> 4. I don't see a justification for a release concept, but it is
> possible -- though sort of sticky with possible nested calls from
> combinations of apps and libraries.  If that were sorted out by a
> central library, then the actual system call on the last release per
> thread would re-arm XFD to prevent access until the next explicit
> request.  Unclear if it is important that the kernel actually do the
> free -- some things might run faster if we keep it around...

I think would be more of a get/put model rather than an allocate/free model.

The "put" could effectively be a noop for now.  But, if we don't put
this in the ABI up front, we can't add it later.  That means that we
could never add a lazy-free, even if we wanted to.



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux