On Thu, Mar 23, 2023 at 05:22:09PM +0100, Petr Mladek wrote: > On Thu 2023-03-23 17:09:13, Petr Mladek wrote: > > On Wed 2023-03-22 18:41:40, Alexey Dobriyan wrote: > > > Hi, Josh. > > > > > > I've been profiling how much time livepatching takes and I have a question > > > regarding these lines: > > > > > > void klp_try_complete_transition(void) > > > { > > > ... > > > if (!complete) { > > > schedule_delayed_work(&klp_transition_work, round_jiffies_relative(HZ)); > > > return; > > > } > > > > > > } > > > > > > The problem here is that if the system is idle, then the previous loop > > > checking idle tasks will reliably sets "complete = false" and then > > > patching wastes time waiting for next second so that klp_transition_work > > > will repeat the same code without reentering itself. > > > > > > I've created trivial patch which changes 2 functions and it takes > > > ~1.3 seconds to complete transition: > > > > > > [ 33.829506] livepatch: 'main': starting patching transition > > > [ 35.190721] livepatch: 'main': patching complete > > > > > > I don't know what's the correct timeout to retry transition work > > > but it can be made near-instant: > > > > > > [ 100.930758] livepatch: 'main': starting patching transition > > > [ 100.956190] livepatch: 'main': patching complete > > > > > > > > > Alexey (CloudLinux) > > > > > > > > > --- a/kernel/livepatch/transition.c > > > +++ b/kernel/livepatch/transition.c > > > @@ -435,8 +435,7 @@ void klp_try_complete_transition(void) > > > * later and/or wait for other methods like kernel exit > > > * switching. > > > */ > > > - schedule_delayed_work(&klp_transition_work, > > > - round_jiffies_relative(HZ)); > > > + schedule_delayed_work(&klp_transition_work, msecs_to_jiffies(1)); > > > > Note that this affects all iterations, not only the first one. In > > practice, it might schedule another klp_transition_work() almost > > immediately. Servers typically use HZ = 250 and 1ms is less than 1 jiffy. > > > > Now, klp_try_complete_transition() takes tasklist_lock and blocks > > forking and exiting. It sounds scary to do this so frequently. > > Is am actually surprised that nobody reported any problems > > even with the 1sec period. > > > > Please, stop these micro-optimizations! > > > > Livepatching is about security and reliability. It is not about speed! > > > > The fact that the transition is so lazy is a great feature. > > Livepatching should make the system more safe without breaking it. > > It is great that it could run in background and do not limit > > the normal workload too much. > > By other words. It might be interesting to find a way how to migrate > the idle tasks during the first klp_try_complete_transition() call. > As a result the 2nd call would not be needed and it would make > livepatching even more lightweight. > > But just increasing the frequency of klp_try_complete_transition() > is not the way to go. 1ms is for demo purposes, to show how fast this thing can go.