Add documentation for introduction to -context-switch -x86 context-switch -MIPS context switch Suggested-by: Lukas Bulwahn <lukas.bulwahn@xxxxxxxxx> Co-developed-by: Mostafa Chamanara <mostafa.chamanara@xxxxxxxxxxxx> Signed-off-by: Mostafa Chamanara <mostafa.chamanara@xxxxxxxxxxxx> Signed-off-by: John Mathew <john.mathew@xxxxxxxxxx> --- Documentation/scheduler/arch-specific.rst | 3 + Documentation/scheduler/context-switching.rst | 71 +++++++++++++++++ Documentation/scheduler/index.rst | 1 + .../scheduler/mips-context-switch.rst | 78 +++++++++++++++++++ .../scheduler/x86-context-switch.rst | 59 ++++++++++++++ 5 files changed, 212 insertions(+) create mode 100644 Documentation/scheduler/context-switching.rst create mode 100644 Documentation/scheduler/mips-context-switch.rst create mode 100644 Documentation/scheduler/x86-context-switch.rst diff --git a/Documentation/scheduler/arch-specific.rst b/Documentation/scheduler/arch-specific.rst index c9c34863d994..65dc393b605f 100644 --- a/Documentation/scheduler/arch-specific.rst +++ b/Documentation/scheduler/arch-specific.rst @@ -9,3 +9,6 @@ Architecture Specific Scheduler Implementation Differences .. toctree:: :maxdepth: 2 + + x86-context-switch + mips-context-switch diff --git a/Documentation/scheduler/context-switching.rst b/Documentation/scheduler/context-switching.rst new file mode 100644 index 000000000000..dd4ff63b1e97 --- /dev/null +++ b/Documentation/scheduler/context-switching.rst @@ -0,0 +1,71 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +========================== +Process context switching +========================== + +Context Switching +----------------- + +Context switching, the switching from a running task to another, is handled by +the :c:func:`context_switch()` function defined in kernel/sched.c . It is called +by __schedule() when a new process has been selected to run. + + The execution flow is as follows: + +* Calls prepare_task_switch() to prepare both previous and new task by + storing or changing some values in their task_struct. + + +* Calls macro :c:macro:`arch_start_context_switch()` + A facility to provide batching of the reload of page tables and other process + state with the actual context switch code for paravirtualized guests. By + convention, only one of the batched update (lazy) modes (CPU, MMU) should be + active at any given time, entry should never be nested, and entry and exits + should always be paired. This is for sanity of maintaining and reasoning about + the kernel code. In this case, the exit (end of the context switch) is in + architecture-specific code, and so doesn't need a generic definition. + + +* The next few steps consists of handling the transfer of real and anonymous + address spaces between the switching tasks. Four possible context switches are + + - kernel task switching to another kernel task. + - user task switching to a kernel task. + - kernel task switching to user task, + - user task switching to user task. + +For a kernel task switching to kernel task :c:func:`enter_lazy_tlb()` is called +which is an architecture specific implementation to handle a context without an +mm. Architectures implement lazy tricks to minimize tlb flushes here. +Then the active address space from the previous task is borrowed (transferred) +to the next task. The active address space of the previous task is set to NULL. + +For a user task switching to kernel task it will have a real address space. This +address space is pinned by calling :c:func:`mmgrab()` . This makes sure that the +address space will not get freed even after the previous task exits. + +For a user task switching to user task the architecture specific +:c:func:`switch_mm_irqs_off()` or :c:func:`switch_mm()` functions. The main +functionality of this calls is to switch the address space between the +user space processes. This includes switching the page table pointers and +ensuring that the new address space has a valid ASID. + +For a kernel task switching to a user task, the active address space of the +kernel task is transferred to the user task and the active address space of the +kernel task is set to NULL. + +* Next the :c:func:`prepare_lock_switch()` function is called for a lockdep + release of the runqueue lock to handle the special case of the scheduler in which + the runqueue lock will be released by the next task. + +* Then the architecture specific implementation of the :c:func:`switch_to()` + function is called to switch the register state and the stack. This involves + saving and restoring stack information and the processor registers and any other + architecture-specific state that must be managed and restored on a per-process + basis. + +* Calls finish_task_switch() must be called after the context switch, + paired with a prepare_task_switch() call before the context switch.It will + reconcile locking set up by prepare_task_switch, and do any other + architecture-specific cleanup actions. diff --git a/Documentation/scheduler/index.rst b/Documentation/scheduler/index.rst index 9772cf81fd96..289e06a68764 100644 --- a/Documentation/scheduler/index.rst +++ b/Documentation/scheduler/index.rst @@ -18,6 +18,7 @@ specific implementation differences. overview cfs-sched-overview + context-switching sched-features arch-specific.rst sched-debugging.rst diff --git a/Documentation/scheduler/mips-context-switch.rst b/Documentation/scheduler/mips-context-switch.rst new file mode 100644 index 000000000000..e917bbe1c104 --- /dev/null +++ b/Documentation/scheduler/mips-context-switch.rst @@ -0,0 +1,78 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +============================================== +MIPS Architecture And Scheduler implementation +============================================== + +Multi-threading in MIPS CPUs +----------------------------- +The MIPS architecture defines four coprocessors. + +- CP0: supports virtual memory system and exception handling. +- CP1: reserved for the floating point coprocessor, the FPU +- CP2: available for specific implementations. +- CP3: reserved for floating point operations in the release 1 implementation + of MIPS64. + +MIPS32 and MIPS64 architectures provide support for optional components known +as Modules or Application Specific Extensions. The MT module enables the +architecture to support multi-threaded implementations. This includes support +for virtual processors and light weight thread contexts. Implementation of MT +features depends on the individual MIPS cores. The virtual processing element (VPE) +maintains a complete copy of the processor state as seen by the software system +which includes interrupts, register set, and MMU. This enables a single processor +to appear to an SMP operating system like two separate cores if it has 2 VPE's. +For example two separate OS can run on each VPE such as Linux and and an RTOS. + +A lighter version of VPE enables threading at the user/application software level. +It is called Thread Context (TC). TC, is the hardware state necessary to support +a thread of execution. This includes a set of general purpose registers (GPRs), +a program counter (PC), and some multiplier and coprocessor state. TC's have +common execution unit. MIPS ISA provides instructions to utilize TC. + +The Quality of service block of the MT module allows the allocation of processor +cycles to threads, and sets relative thread priorities. This enables 2 thread +prioritization mechanisms. The user can prioritize one thread over the other as +well as allocate a specific ratio of the cycles to specific threads. These +mechanisms help to allocate bandwidth a set of threads effectively. QoS block +improves system level determinism and predictability. QosS block can be replaced +by more application specific blocks. + +MIPS Context Switch +------------------- + +Context switch behavior specific to MIPS begins in the way :c:macro:`switch_to()` +macro is implemented. The main steps in the MIPS implementation of the macro are: + +* Handle the FPU affinity management feature . This feature is enabled by the + :c:macro:`CONFIG_MIPS_MT_FPAFF` at build time The macro checks if the FPU was + used in the most recent time slice. In case FPU was not used, the restriction of + having to run on a cpu with FPU is removed. +* For the previous task, disable the fpu and clear the bit indicating the FPU was + used in this quantum for the task. +* If fpu is enabled in the next task, check FCSR for any unmasked exceptions + pending, clear them and send a signal. +* if MIPS DSP modules is enabled, save the dsp context of the previous task and + restore the dsp context of the next task. +* If coprocessor 2 is present set the access allowed field of the coprocessor 2. +* if coprocessor 2 access allowed field was set in previous task, clear it. +* clear the the access allowed field of the coprocessor 2. +* clear the llbit on MIPS release 6 such that instruction eretnc can be used + unconditionally when returning to userland in entry.S. LLbit is used to specify + operation for instructions that provide atomic read-modify-write. LLbit is set + when a linked load occurs and is tested by the conditional store. It is cleared, + during other CPU operation, when a store to the location would no longer be + atomic. In particular, it is cleared by exception return instructions. + eretnc instruction enables to return from interrupt, exception, or error trap + without clearing the LLbit. +* clear the global variable ll_bit used by mips exception handler. +* write the thread pointer to the mips userlocal register if the cpu supports + this feature. This register is not interpreted by hardware and can be used to + share data between privileged and unprivileged software. +* if hardware watchpoint feature is enabled during build the watchpoint registers + are restored from the next task. +* Finally the mips processor specific implementation of the :c:func:`resume()` + function is called. It restores the registers of the next task including the + stack pointer. The implementation is in assembly:: + + arch/mips/kernel/r4k_switch.S diff --git a/Documentation/scheduler/x86-context-switch.rst b/Documentation/scheduler/x86-context-switch.rst new file mode 100644 index 000000000000..ae7b2e09453a --- /dev/null +++ b/Documentation/scheduler/x86-context-switch.rst @@ -0,0 +1,59 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +X86 Context Switch +------------------ + +The x86 architecture context switching logic is as follows. +After the switching of MM in the scheduler :c:func:`context_switch()` the call +to the x86 implementation of :c:macro:`switch_to()` +is made. For x86 arch it is located at :: + + arch/x86/include/asm/switch_to.h + +Since 4.9, switch_to() has been broken in to two parts: a :c:func:`prepare_switch_to()` +macro and the inline assembly portion of has been moved to an actual assembly +file :: + + arch/x86/entry/entry_64.S + +* There is still a C portion of the switch which occurs via a jump in the middle + of the assembly code. The source is located in arch/x86/kernel/process_64.c + since 2.6.24 + +The main function of the prepare_switch_to() is to handle the case when stack +uses virtual memory. This is configured at build time and is mostly enable in +most modern distributions. This function accesses the stack pointer to prevent a +double fault.Switching to a stack that has top-level paging entry that is not +present in the current MM will result in a page fault which will be promoted to +double fault and the result is a panic. So it is necessary to probe the stack now +so that the vmalloc_fault can fix the page tables. + +The main steps of the inline assembly function __switch_to_asm() are: + +* store the callee saved registers to the old stack which will be switched away from. +* swaps the stack pointers between the old and the new task. +* move the stack canary value to the current cpu's interrupt stack. +* if return trampoline is enabled, overwrite all entries in the RSB on exiting + a guest, to prevent malicious branch target predictions from affecting the host + kernel. +* restore all registers from the new stack previously pushed in reverse order. + +The main steps of the c function :c:func:`__switch_to()` which the assembly code +jumps to is as follows: + +* retrieve the thread :c:type:`struct thread_struct <thread_struct>` and fpu + :c:type:`struct fpu <fpu>` structs from the next and previous tasks. +* gets the current cpu TSS :c:type:`struct tss_struct <tss_struct>` +* save the current FPU state while on the old task. +* store the FS and GS segment registers before changing the thread local storage. +* reload the GDT for the new tasks TLS. +* save the ES and DS segments of the previous task and load the same from the + nest task. +* load the FS and GS segment registers. +* update the current task of the cpu. +* update the top of stack pointer for the CPU for entry trampoline. +* initialize FPU state for next task. +* set sp0 to point to the entry trampoline stack. +* call :c:func:`_switch_to_xtra()` to handles debug registers, i/o bitmaps and + speculation mitigation. +* Writes the task's CLOSid/RMID to IA32_PQR_MSR. -- 2.17.1