On Sat, 2020-03-07 at 19:42 -0800, Alex Belits wrote: > This is the updated version of task isolation patchset. > > 1. Commit messages updated to match changes. > 2. Sign-off lines restored from original patches, changes listed > wherever applicable. > 3. arm platform -- added missing calls to syscall check and cleanup > procedure after leaving isolation. > 4. x86 platform -- added missing calls to cleanup procedure after > leaving isolation. > Another update, addressing CPU state / race conditions. I believe, I have some usable solution for the problem of both missing the events and race conditions on isolation entry and exit. The idea is to make sure that CPU core remains in userspace and runs userspace code regardless of what is happening in kernel and userspace in the rest of the system, however any events that results in running anything other than userspace code will result in CPU core re-synchronizing with the rest of the system. Then any kernel code, with the exception of small and clearly defined set of routines that only perform kernel entry / exit themselves, will run on CPU after all synchronization is done. This does require an answer to possible races between isolation entry / exit (including isolation breaking on interrupts) and updates that are normally carried by IPIs. So the solution should involve some mechanism that limits what runs on CPU in its "stale" state, and causes inevitable synchronization before the rest of the kernel is called. This should also include any preemption -- if preemtion happens in that "stale" state after entering the kernel but before synchronization is completed, it should still go through synchronization before running the rest of the kernel. Then as long as it can be demonstrated that routines running in "stale" state can safely run in it, and any event that would normally require IPI, will result in entering the rest of kernel after synchronization, race would cease to be a problem. Any sequence of events would result in exactly the same CPU state when hitting the rest of the kernel, as if CPU processed the update event through IPI. I was under impression that this is already the case, however after some closer look it appears that some barriers must be in place to make sure that the sequence of events is enforced. There is obviously a question of performance -- we don't want to cause any additional flushes or add locking in anything time-critical. Fortunately entering and exiting isolation (as opposed to events that _potentially_ can call isolation-breaking routines) is never performance-critical, it's what starts and ends a task that has no performance-critical communication with kernel. So if a CPU that has isolated task on it is running kernel code, it means that either the task is not isolated yet (we are exiting to userspace), or it is no longer running anything performance-critical (intentionally on exit from isolation, or unintentionally on isolation breaking event). Isolation state is read-mostly, and we would prefer RCU for that if we can guarantee that "stale" state remains safe in all code that runs until synchronization happen. I am not sure of that, so I tried to make something more straightforward, however I might be wrong, and RCU-ifying exit from isolation may be a better way do do it. For now I want to make sure that there is some clearly defined small amount of kernel code that runs before the inevitable synchronization, and that code is unaffected by "stale" state. I have tried to track down all call paths from kernel entry points to the call of fast_task_isolation_cpu_cleanup(), and will post those separately. It's possible that all architecture-specific code already follows some clearly defined rules about this for other reasons, however I am not that familiar with all of it, and tried to check if existing implementation is always safe for running in "stale" state before everything that makes task isolation call its cleanup. For now, this is the implementation that assumes that "stale" state is safe for kernel entry.