On Thu, Jan 14, 2021 at 09:22:54AM +0000, Christoph Lameter wrote: > On Wed, 13 Jan 2021, Marcelo Tosatti wrote: > > > So as discussed, this is one possible prctl interface for > > task isolation. > > > > Is this something that is desired? If not, what is the > > proper way for the interface to be? > > Sure that sounds liek a good beginning but I guess we need some > specificity on the features > > > +Task isolation CPU interface > > +============================ > > How does one do a oneshot flush of OS activities? ret = prctl(PR_TASK_ISOLATION_REQUEST, ISOL_F_QUIESCE, 0, 0, 0); if (ret == -1) { perror("prctl PR_TASK_ISOLATION_REQUEST"); exit(0); } > > I.e. I have a polling loop over numerous shared and I/o devices in user > space and I want to make sure that the system is quite before I enter the > loop. You could configure things in two ways: with syscalls allowed or not. Syscalls disallowed: =================== 1) Add a new isolation feature ISOL_F_BLOCK_SYSCALLS (to block certain syscalls) along with ISOL_F_SETUP_NOTIF (to notify upon isolation breaking): if ((ifeat & ISOL_F_BLOCK_SYSCALLS) == ISOL_F_BLOCK_SYSCALLS) { struct task_isolation_block_syscalls tibs = { list of syscalls to block, additional parameters } struct task_isolation_notif tis = { parameters to control signal handling upon isolation breaking event } ret = prctl(PR_TASK_ISOLATION_SET, ISOL_F_SETUP_NOTIF, &tis); if (ret != 0) { ... } featuremask |= ISOL_F_SETUP_NOTIF; ret = prctl(PR_TASK_ISOLATION_SET, ISOL_F_BLOCK_SYSCALLS, &tibs); if (ret != 0) { ... } featuremask |= ISOL_F_BLOCK_SIGNALS; featuremask |= ISOL_F_QUIESCE; } This would require knowledge of the behaviour of individual system calls, that is whether or not these syscalls cause the CPU to be a target of interruptions (1) (while the QUIESCE / HARD / WARN division you propose allows for coarse-grained control). Perhaps coarse control while also allowing finer grained control (if desired) is a useful choice? 1: for example adding free pages to per-cpu free lists. Syscalls allowed: ================= > In the loop itself some activities may require syscalls so they will > potentialy cause the OS services such as timers to start again. Or a different mode where the syscall return itself can finish any pending activities. > When such > an activities is complete another quiet down call can be issued. Although this seems more efficient (if multiple syscalls are to be used). > Could be implemented by setting a flag that does an action and then resets > itself? Or the flag could be reset if a syscall that requires timers etc > is used? You mean to let userspace know if a certain syscall triggered a pending action which must be finished (before "quiet mode" is entered again) ? Sounds like a good idea. > Features that I think may be needed: > > F_ISOL_QUIESCE -> quiet down now but allow all OS activities. OS > activites reset flag > > F_ISOL_BAREMETAL_HARD -> No OS interruptions. Fault on syscalls that > require such actions in the future. Question: why BAREMETAL ? Two comments: 1) HARD mode could also block activities from different CPUs that can interrupt this isolated CPU (for example CPU hotplug, or increasing per-CPU trace buffer size). Unclear whether such blockage should be performed on: -> Individual action basis (eg: BLOCK_CPU_HOTPLUG, BLOCK_PERCPU_TRACEBUFFER_SIZE, ...) (which could allow individual unblocking through a sysfs interface, for example). Or -> Be tied to a flag with a less implementation specific meaning such as F_ISOL_BAREMETAL_HARD. 2) For a type of application it is the case that certain interruptions can be tolerated, as long as they do not cross certain thresholds. For example, one loses the flexibility to read/write MSRs on the isolated CPUs (including performance counters, RDT/MBM type MSRs, frequency/power statistics) by forcing a "no interruptions" mode. That flexibility seems to be useful (so perhaps F_ISOL_BAREMETAL_HARD but optionally permitting certain interruptions). > F_ISOL_BAREMETAL_WARN -> Similar. Create a warning in the syslog when OS > services require delayed processing etc > but continue while resetting the flag. Alex seems to be interested in different notification methods as well. Thanks for the input.