Hi Chrisma,
On 12/22/2021 8:17 AM, Chrisma Pakha wrote:
The following is our understanding of the proposed User Interrupt.
Thank you for giving this some thought.
We have been exploring how user-level interrupts (UIs) can be used to
improve performance and programmability in several different areas:
e.g., parallel programming, memory management, I/O, and floating-point
libraries.
Can you please share more details on this? It would really help improve
the API design.
# Current Use Cases
The Current RFC is focused on sending an interrupt from one user-space
thread (UST) to another user-space thread (UST2UST). These threads
could be in different processes, as long as the sender has access to
the receiver's User Interrupt File Descriptor (uifd). Based on our
understanding, UIs are currently targeted as a low overhead
alternative for the current IPC mechanisms.
That's correct.
# Preparing for future use cases
> If someone could point out an example for Kernel to
user-space thread (K2UST) UI, we would appreciate it.
The idea here is improve the kernel-to-user event notification latency.
Theoretically, this can be useful when the kernel sees event completion
on one cpu but it want to signal (notify) a thread actively running on
some other CPU. The receiver thread can save some cycles by avoiding
ring transitions to receive the event.
IO_URING is one of the examples for kernel-to-user event notifications.
We are evaluating whether providing a UINTR based completion mechanism
can have benefit over eventfd based completions. The benefits in
practice are yet to be measured and proven.
In our work, we have also been exploring precise UIs from the
currently running thread. We call these CPU to UST (CPU2UST) UIs.
For example, a SIGSEGV generated by writing to a read-only page, a
SIGFPE generated by dividing a number by zero.
It is definitely possible in future to delivery CPU events as User
Interrupts. The hardware architecture for this is still being worked on
internally.
Though our focus isn't on exceptions being delivered as User Interrupts.
Do you have details on what type of benefit is expected?
- QUESTION: Is there is a rough draft/plan that we can refer to that
describes the
current thinking on these three cases.
- QUESTION: Are there use cases for K2UST, or is K2UST the same as CPU2UST?
No, K2UST isn't the same as CPU2UST. We would expect limited benefits
from K2UST but on the other hand CPU2UST can provide significant speedup
since it avoids the kernel completely.
Unfortunately, due to the large scope of the feature, the hardware
architecture development is happening in stages. I don't have detailed
plans for each of the sources of User Interrupts.
Here is our rough plan:
1. Provide a common infrastructure to receive User Interrupts. This is
independent of the source of the interrupt. The intention here is to
keep the software APIs generic and extendable so that future sources can
be added without causing much disturbance to the older APIs.
2. Introduce various sources of User Interrupts in stages:
UST2UST - This RFC. Available in the upcoming Sapphire Rapids processor.
K2UST - Also available in upcoming Sapphire Rapids. Working towards
proving the value before sending something out.
D2UST - Future processor. Hardware architecture being worked on
internally. Not much to share right now.
CPU2UST - Future processor. Hardware architecture being worked on
internally. Not much to share right now.
# Basic Understanding
The overall description you have mentioned below looks good to me. I
have added some minor comments for clarification.
Also, the abbreviations that you have used are somewhat different from
the ones I have used in the patches.
First, we would like to make sure that our understanding of the
terminology and the data structures is correct.
- User Interrupt Vector (UIV): The identity of the user interrupt.
- User Interrupt Target Table (UITT):
This allows the sender to locate the "address" of the receiver
through the uifd.
The UITT refers to the 'UPID' address which is different from the uifd
that you mention below.
Below outlines our understanding of the current API for UIs.
All of the statements below seem accurate.
However, some of the restrictions below are due to hardware design and
some are mainly due to the software implementation. The software design
and APIs might change significantly as this patch series evolves.
Please feel free to provide input wherever you think the APIs can be
improved.
- Each thread that can receive UIs has exactly one handler
registered with `uintr_register_handler` (a syscall).
- Each thread that registers a handler calls `uintr_create_fd` for
every user-level interrupt vector (UIV) that they expect to receive.
- The only information delivered to the handler is the UIV.
- There are 64 UIVs that can be used per thread.
Though only one generic handler is registered with the hardware, an
application can choose to implement 64 unique sub-handlers in user space
based on each unique UIV.
- A thread that wants to send a UI must register the receiver's uifd
with `uintr_register_sender` (a syscall).
This returns an index the sender uses to locate the receiver.
- `_senduipi(index)` sends a user interrupt to a particular destination.
The sender's UITT and index determine the destination.
- A thread uses `_stui` (and `_clui`) to enable (and disable) the
reception of UIs.
- As for now, there is no mechanism to mask a particular UIV.
- A UI is delivered to the receiver immediately only if it is currently
running.
- If a thread executes the `uintr_wait()`, it will be scheduled only
after receiving a UI.
There is no guarantee on the delay between the processor receiving
the UI and when the thread is scheduled.
- If a thread is the target of a UI and another thread is running, or
the target thread is blocked in the kernel,
then the target thread will handle the UI when it is next scheduled.
- Ordinary interrupts (interrupt delivered with CPL=0) have a higher
priority over user interrupts.
- UI handler only saves general-purpose registers (e.g., do not save
floating-point registers).
The saving and restoring of the registers is done by gcc when the muintr
flag along with the 'interrupt' attribute is used. Applications can
choose to save floating point registers as part of the interrupt handler
as well.
To make it easier for applications we are working on implementing a thin
library that can help with some of this common functionality like saving
floating point registers or redirecting to 64 sub-handlers.
- User Interrupts with higher UIV are given a higher priority than those
with smaller UIV.
## Private UITT
The Current RFC focuses on a private UITT where each thread has its own
UITT. Thus, different threads executing `_senduipi(index1)` with the
same `index1` may cause different receiver threads to be interrupted.
That's right.
In many cases, the receiver of an interrupt needs to know which thread
sent the interrupt. If we understand the proposal correctly, there are
only 64 user-level interrupt vectors (UIVs), and the UIV is the only
information transmitted to the receiver. The UIV itself allows the
receiver to distinguish different senders through careful management
of the receiver's UIV.
That's correct. User Interrupts mainly provide a door bell mechanism
with the actual data expected to be shared through some existing mechanism.
If multiple senders want to share the same interrupt vector then they
would have to rely on some sort of shared memory (or similar) mechanism
to relay the relevant information to the receiver. This would likely
come with some latency cost.
- QUESTION: Given the code below where the same UIV is registered twice:
```c
uintr_fd1 = uintr_create_fd(vector1, flags)
uintr_fd2 = uintr_create_fd(vector1, flags)
```
Would `uintr_fd1` be the same as `uintr_fd2`, or would it be registered
with a different index in the UITT table?
In the current design, if the same thread tries to register the same
vector again the second uintr_create_fd() would fail with a EBUSY error
code.
- QUESTION: If it is registered in a different index, would the
receiver be able to distinguish the sender if `uintr_fd1` and
`uintr_fd2` are used from two different threads?
- QUESTION: What is the intended future use of the `flags` argument?
In the uintr_create_fd() call, flags would be used to provide options
such as O_CLOEXEC. In general, I added flags argument to all the system
calls to keep them extendable when new boolean options need to be added.
## Shared UITT
In the case of the shared UITT model, all the threads share the same
UITT and thus, if two different threads execute `_senduipi(index)`
with the same index, they would both cause an interrupt in the
same destination/receiver.
- QUESTION: Since both threads use the same entry (same
destination/receiver), does this mean that the receiver will not be
able to distinguish the sender of the interrupt?
Yes. However this is true even in case of a private UITT. This isn't
because the senders used the same UITT index rather it is the result of
the senders generating the same UIVs.
For example, even if a receiver created 2 FDs with 2 unique vectors.
uintr_fd1 = uintr_create_fd(vector1, flags)
uintr_fd2 = uintr_create_fd(vector2, flags)
In case of the a private UITT, both sender threads can register
themselves with uintr_fd1. They might get different uitt indexes
returned to them. But when they generate a User interrupt using their
respective index, the end result would be the same. The receiver will
see the same vector1 being generated. There is no way for the receiver
to distinguish the sender without some additional information being
shared somewhere.
# Multi-threaded parallel programming example
One of the uses for UIs that we have been exploring is combining the
message-passing and shared memory models for parallel programming. In
our approach, message-passing is used for synchronization and shared
memory for data sharing. The message passing part of the programming
pattern is based loosely on Active Messages (See ISCA92), where a
particular thread can turn off/on interrupts to ignore incoming
messages so they can execute critical sections without having to
notify any other threads in the system.
This look like a good fit for the User IPI (UST2UST) implementation in
this RFC. Have you had a chance to evaluate the current API design for
this usage?
Also, is any of the above work publicly available?
- QUESTION: Is there any data on the performance impact of `_stui` and
`_clui`?
_stui and _clui are expected to have very minimal overhead since they
only modify a local flag. I'll try to measure this next time I am doing
some performance measurement.
Thanks,
Sohil