On Thu, Jan 21, 2021 at 01:20:59PM -0300, Marcelo Tosatti wrote: > > Adding Nitesh to CC. > > On Thu, Jan 21, 2021 at 12:51:41PM -0300, Marcelo Tosatti wrote: > > Hi Alex, > > > > On Fri, Jan 15, 2021 at 10:35:14AM -0800, Alex Belits wrote: > > > On 1/15/21 05:24, Christoph Lameter wrote: > > > > > > > ---------------------------------------------------------------------- > > > > On Thu, 14 Jan 2021, Marcelo Tosatti wrote: > > > > > > > > > > How does one do a oneshot flush of OS activities? > > > > > > > > > > ret = prctl(PR_TASK_ISOLATION_REQUEST, ISOL_F_QUIESCE, 0, 0, 0); > > > > > if (ret == -1) { > > > > > perror("prctl PR_TASK_ISOLATION_REQUEST"); > > > > > exit(0); > > > > > } > > > > > > > > > > > > > > > > > I.e. I have a polling loop over numerous shared and I/o devices in user > > > > > > space and I want to make sure that the system is quite before I enter the > > > > > > loop. > > > > > > > > > > You could configure things in two ways: with syscalls allowed or not. > > > > > > > > Well syscalls that do not cause deferred processing like getting the time > > > > or determining the current cpu should be ok to use. > > > > > > Some of those syscalls go through vdso, and don't enter the kernel -- > > > nothing specific is necessary to allow them, and it would be pointless and > > > difficult to prevent them. > > > > > > For syscalls that enter the kernel, it's often difficult to predict, if they > > > will or won't cause deferred processing, so I am afraid, it won't be > > > possible to provide a "safe" class of syscalls for this purpose and not end > > > up with something minimal like reading /sys and /proc. Right now isolation > > > only "allows" syscalls that exit isolation. > > > > Christoph wrote: > > > > "> Features that I think may be needed: > > > > > > F_ISOL_QUIESCE -> quiet down now but allow all OS activities. OS > > > activites reset flag > > > > > > F_ISOL_BAREMETAL_HARD -> No OS interruptions. Fault on syscalls that > > > require such actions in the future. > > > > > > F_ISOL_BAREMETAL_WARN -> Similar. Create a warning in the syslog when OS > > > services require delayed processing etc > > > but continue while resetting the flag. > > " > > > > It seems the only difference between HARD and WARN (lets call it SOFT) > > would be whether a notification is sent to userspace. > > > > The definition > > > > "F_ISOL_BAREMETAL_HARD -> No OS interruptions. Fault on syscalls that > > require such actions in the future." > > > > fails in the static_key_enable case: Alex's idea is to queue the i-cache > > flush if the remote task/cpu is in isolated mode (and perform the flush > > when entering the kernel). > > > > So even if userspace uses syscalls that do not require delayed > > processing, there are events which are out of control of the > > application and might require it. > > > > So lets assume the application performs a number of syscalls on a > > given time critical codepath. > > > > Either the system is configured so that > > the number/frequency of static_key_enable's is limited, or the cost of > > i-cache flushes must be accounted on that critical codepath. > > > > Anyway, trying to improve Christoph's definition: > > > > F_ISOL_QUIESCE -> flush any pending operations that might cause > > the CPU to be interrupted (ex: free's > > per-CPU queues, sync MM statistics > > counters, etc). > > > > F_ISOL_ISOLATE -> inform the kernel that userspace is > > entering isolated mode (see description > > below on "ISOLATION MODES"). > > > > F_ISOL_UNISOLATE -> inform the kernel that userspace is > > leaving isolated mode. > > > > F_ISOL_NOTIFY -> notification mode of isolation breakage > > modes. > > > > > > Isolation modes: > > --------------- > > > > There are two main types of isolation modes: > > > > - SOFT mode: does not prevent activities which might generate interruptions > > (such as CPU hotplug). > > > > - HARD mode: prevents all blockable activities that might generate interruptions. > > Administrators can override this via /sys. > > > > Notifications: > > ------------- > > > > Notification mode of isolation breakage can be configured as follows: > > > > - None (default): No notification is performed by the kernel on isolation > > breakage. > > > > - Syslog: Isolation breakage is reported to syslog. > > > > (new modes can be added, for example signals). > > > > A new feature can be added to disallow syscalls (by default syscalls > > are enabled, with reporting of pending activities that might cause > > an interruption in a VDSO). After discussion with Juri and Daniel, it became clearer that supporting unmodified applications would be quite useful: - enter isolation mode - run unmodified application - leave isolation mode This could work via an additional mode which goes through the quiesce operation at every syscall return. Since this includes freeing per-CPU pagevecs (therefore allocating per-CPU pagevecs at the next syscall), it might considerably slowdown system startup (and cause MM related spinlocks contention). Better ideas are appreciated.