Re: Error Cannot acquire state change lock from remoteDispatchDomainMigratePrepare3Params during live migration of domains

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 07, 2024 at 04:20:32PM +0100, Michal Prívozník wrote:
> On 3/7/24 10:51, Martin Kletzander wrote:
> > On Wed, Mar 06, 2024 at 05:17:36PM +0100, Christian Rohmann via Users
> > wrote:
> >> Hallo libvirt-users!
> >>
> > 
> > Hi, I'll try to reply in the simplest possible way.
> > 
> >> we observe lock-ups / timeouts with in prometheus-libvirt-exporter
> >> (https://github.com/inovex/prometheus-libvirt-exporter) when
> >> libvirt is live-migrating domains:
> >>
> >>> Timed out during operation: cannot acquire state change lock (held by
> >>> monitor=remoteDispatchDomainMigratePrepare3Params)
> >>
> >>
> >> All of the source code can be found at:
> >> https://github.com/inovex/prometheus-libvirt-exporter/blob/master/pkg/exporter/prometheus-libvirt-exporter.go.
> >> Basically the error happens when DomainMemoryStats or other operational
> >> domain info is queried via the libvirt socket.
> >>
> > 
> > Yes, the domain is being modified by the migration, so it is locked.
> 
> While this is true, the "lock" - or job I should rather say is an async
> one, meaning a QUERY job can be acquired. It's only MODIFY job that
> should wait in the queue.
> 
> What's rather weird is - the thread holding the job is 'MigratePrepare'
> which usually isn't that long.

I wonder if something is hitting the 'max_client_requests' limit and
getting stalled.

The initial thread message here says the lockup is happening during
bulk concurrent live migrations of 200 VMs, 5 at a time.

The default 'max_client_requests' is 5.... DANGER WILL ROBINSON...

With live migration making requests across multiple libvirt daemons,
if the target host has filled its 5 requests queue with long running
operations, and then a "prepare migrate' call comes in, that'll get
stalled behind a possibly slow operation at the RPC dispatch level.

I'd suggest bumping 'max_client_requests' to 100 and seeing if the
problem goes away.

If so I wonder if we shouldn't raise our out of the box limits.
'5' is pretty low considering the scale of virtualization hosts
in the modern world, and where even my laptop has 20 CPUs and
64 GB of RAM.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
_______________________________________________
Users mailing list -- users@xxxxxxxxxxxxxxxxx
To unsubscribe send an email to users-leave@xxxxxxxxxxxxxxxxx




[Index of Archives]     [Virt Tools]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]

  Powered by Linux