On Wed 2023-03-22 18:41:40, Alexey Dobriyan wrote: > Hi, Josh. > > I've been profiling how much time livepatching takes and I have a question > regarding these lines: > > void klp_try_complete_transition(void) > { > ... > if (!complete) { > schedule_delayed_work(&klp_transition_work, round_jiffies_relative(HZ)); > return; > } > > } > > The problem here is that if the system is idle, then the previous loop > checking idle tasks will reliably sets "complete = false" and then > patching wastes time waiting for next second so that klp_transition_work > will repeat the same code without reentering itself. > > I've created trivial patch which changes 2 functions and it takes > ~1.3 seconds to complete transition: > > [ 33.829506] livepatch: 'main': starting patching transition > [ 35.190721] livepatch: 'main': patching complete > > I don't know what's the correct timeout to retry transition work > but it can be made near-instant: > > [ 100.930758] livepatch: 'main': starting patching transition > [ 100.956190] livepatch: 'main': patching complete > > > Alexey (CloudLinux) > > > --- a/kernel/livepatch/transition.c > +++ b/kernel/livepatch/transition.c > @@ -435,8 +435,7 @@ void klp_try_complete_transition(void) > * later and/or wait for other methods like kernel exit > * switching. > */ > - schedule_delayed_work(&klp_transition_work, > - round_jiffies_relative(HZ)); > + schedule_delayed_work(&klp_transition_work, msecs_to_jiffies(1)); Note that this affects all iterations, not only the first one. In practice, it might schedule another klp_transition_work() almost immediately. Servers typically use HZ = 250 and 1ms is less than 1 jiffy. Now, klp_try_complete_transition() takes tasklist_lock and blocks forking and exiting. It sounds scary to do this so frequently. Is am actually surprised that nobody reported any problems even with the 1sec period. Please, stop these micro-optimizations! Livepatching is about security and reliability. It is not about speed! The fact that the transition is so lazy is a great feature. Livepatching should make the system more safe without breaking it. It is great that it could run in background and do not limit the normal workload too much. Best Regards, Petr