Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu 2023-01-26 15:12:35, Seth Forshee (DigitalOcean) wrote:
> On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote:
> > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote:
> > > We've fairly regularaly seen liveptches which cannot transition within kpatch's
> > > timeout period due to busy vhost worker kthreads.
> > 
> > I have missed this detail. Miroslav told me that we have solved
> > something similar some time ago, see
> > https://lore.kernel.org/all/20220507174628.2086373-1-song@xxxxxxxxxx/
> 
> Interesting thread. I had thought about something along the lines of the
> original patch, but there are some ideas in there that I hadn't
> considered.

Could you please provide some more details about the test system?
Is there anything important to make it reproducible?

The following aspects come to my mind. It might require:

   + more workers running on the same system
   + have a dedicated CPU for the worker
   + livepatching the function called by work->fn()
   + running the same work again and again
   + huge and overloaded system


> > Honestly, kpatch's timeout 1 minute looks incredible low to me. Note
> > that the transition is tried only once per minute. It means that there
> > are "only" 60 attempts.
> > 
> > Just by chance, does it help you to increase the timeout, please?
> 
> To be honest my test setup reproduces the problem well enough to make
> KLP wait significant time due to vhost threads, but it seldom causes it
> to hit kpatch's timeout.
> 
> Our system management software will try to load a patch tens of times in
> a day, and we've seen real-world cases where patches couldn't load
> within kpatch's timeout for multiple days. But I don't have such an
> environment readily accessible for my own testing. I can try to refine
> my test case and see if I can get it to that point.

My understanding is that you try to load the patch repeatedly but
it always fails after the 1 minute timeout. It means that it always
starts from the beginning (no livepatched process).

Is there any chance to try it with a longer timeout, for example, one
hour? It should increase the chance if there are more problematic kthreads.

> > This low timeout might be useful for testing. But in practice, it does
> > not matter when the transition is lasting one hour or even longer.
> > It takes much longer time to prepare the livepatch.
> 
> Agreed. And to be clear, we cope with the fact that patches may take
> hours or even days to get applied in some cases. The patches I sent are
> just about improving the only case I've identified which has lead to
> kpatch failing to load a patch for a day or longer.

If it is acceptable to wait hours or even days then the 1 minute
timeout is quite contra-productive. We actually do not use any timeout
at all in livepatches provided by SUSE.

Best Regards,
Petr
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization



[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux