On Wed, 15 Nov 2017, Miroslav Benes wrote: > Live patching consistency model is of LEAVE_PATCHED_SET and > SWITCH_THREAD. This means that all tasks in the system have to be marked > one by one as safe to call a new patched function. Safe means when a > task is not (sleeping) in a set of patched functions. That is, no > patched function is on the task's stack. Another clearly safe place is > the boundary between kernel and userspace. The patching waits for all > tasks to get outside of the patched set or to cross the boundary. The > transition is completed afterwards. > > The problem is that a task can block the transition for quite a long > time, if not forever. It could sleep in a set of patched functions, for > example. Luckily we can force the task to leave the set by sending it a > fake signal, that is a signal with no data in signal pending structures > (no handler, no sign of proper signal delivered). Suspend/freezer use > this to freeze the tasks as well. The task gets TIF_SIGPENDING set and > is woken up (if it has been sleeping in the kernel before) or kicked by > rescheduling IPI (if it was running on other CPU). This causes the task > to go to kernel/userspace boundary where the signal would be handled and > the task would be marked as safe in terms of live patching. > > There are tasks which are not affected by this technique though. The > fake signal is not sent to kthreads. They should be handled differently. > They can be woken up so they leave the patched set and their > TIF_PATCH_PENDING can be cleared thanks to stack checking. > > For the sake of completeness, if the task is in TASK_RUNNING state but > not currently running on some CPU it doesn't get the IPI, but it would > eventually handle the signal anyway. Second, if the task runs in the > kernel (in TASK_RUNNING state) it gets the IPI, but the signal is not > handled on return from the interrupt. It would be handled on return to > the userspace in the future when the fake signal is sent again. Stack > checking deals with these cases in a better way. > > If the task was sleeping in a syscall it would be woken by our fake > signal, it would check if TIF_SIGPENDING is set (by calling > signal_pending() predicate) and return ERESTART* or EINTR. Syscalls with > ERESTART* return values are restarted in case of the fake signal (see > do_signal()). EINTR is propagated back to the userspace program. This > could disturb the program, but... > > * each process dealing with signals should react accordingly to EINTR > return values. > * syscalls returning EINTR happen to be quite common situation in the > system even if no fake signal is sent. > * freezer sends the fake signal and does not deal with EINTR anyhow. > Thus EINTR values are returned when the system is resumed. > > The very safe marking is done in architectures' "entry" on syscall and > interrupt/exception exit paths, and in a stack checking functions of > livepatch. TIF_PATCH_PENDING is cleared and the next > recalc_sigpending() drops TIF_SIGPENDING. In connection with this, also > call klp_update_patch_state() before do_signal(), so that > recalc_sigpending() in dequeue_signal() can clear TIF_PATCH_PENDING > immediately and thus prevent a double call of do_signal(). > > Note that the fake signal is not sent to stopped/traced tasks. Such task > prevents the patching to finish till it continues again (is not traced > anymore). > > Last, sending the fake signal is not automatic. It is done only when > admin requests it by writing 1 to signal sysfs attribute in livepatch > sysfs directory. > > Signed-off-by: Miroslav Benes <mbenes@xxxxxxx> > Cc: Oleg Nesterov <oleg@xxxxxxxxxx> > Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> > Cc: Andy Lutomirski <luto@xxxxxxxxxx> > Cc: linuxppc-dev@xxxxxxxxxxxxxxxx > Cc: x86@xxxxxxxxxx > --- > Documentation/ABI/testing/sysfs-kernel-livepatch | 12 +++++++ > Documentation/livepatch/livepatch.txt | 11 +++++-- > arch/powerpc/kernel/signal.c | 6 ++-- > arch/x86/entry/common.c | 6 ++-- > kernel/livepatch/core.c | 30 +++++++++++++++++ > kernel/livepatch/transition.c | 41 ++++++++++++++++++++++++ > kernel/livepatch/transition.h | 1 + > kernel/signal.c | 4 ++- I'd like to be queuing this patchset for the next merge window, so if there are any objections for the out-of-kernel/livepatch/* changes, please speak up now. Thanks. -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe live-patching" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html