Re: [RFC PATCH] livepatch: Speed up transition retries

Petr Mladek <pmladek@xxxxxxxx> · Thu, 8 Jul 2021 12:35:24 +0200

On Wed 2021-07-07 14:49:41, Vasily Gorbik wrote:
> That's just a racy hack for now for demonstration purposes.
> 
> On a s390 system with large amount of cpus
> klp_try_complete_transition() often cannot be "complete" from the first
> attempt. klp_try_complete_transition() schedules itself as delayed work
> after a second delay. This accumulates to significant amount of time when
> there are large number of livepatching transitions.
> 
> This patch tries to minimize this delay to counting processes which still
> need to be transitioned and then scheduling
> klp_try_complete_transition() right away.
> 
> For s390 LPAR with 128 cpu this reduces livepatch kselftest run time
> from
> real    1m11.837s
> user    0m0.603s
> sys     0m10.940s
> 
> to
> real    0m14.550s
> user    0m0.420s
> sys     0m5.779s
> 
> And qa_test_klp run time from
> real    5m15.950s
> user    0m34.447s
> sys     15m11.345s
> 
> to
> real    3m51.987s
> user    0m27.074s
> sys     9m37.301s
> 
> Would smth like that be useful for production use cases?
> Any ideas how to approach that more gracefully?

Honestly, I do not see a real life use case for this, except maybe
speeding up a test suite.

The livepatch transition is more about reliability than about speed.
In the real life, a livepatch will be applied only once in a while.

We have spent weeks thinking about and discussing the consistency
model, code, and barriers to handle races correctly. Especially,
klp_update_patch_state() is a super-sensitive beast because it is
called without klp_lock. It might be pretty hard to synchronize
it with klp_reverse_transition() or klp_force_transition().

You would need to come up with a really convincing use case and
numbers to make it worth the effort.

Best Regards,
Petr