Re: [PATCH] qemu: Refresh capabilities when creating resctrl allocation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 02, 2018 at 14:57:54 +0100, Martin Kletzander wrote:
> On Fri, Feb 02, 2018 at 02:29:03PM +0100, Peter Krempa wrote:
> > On Fri, Feb 02, 2018 at 12:44:03 +0100, Martin Kletzander wrote:
> > > Since one of the things in capabilities (info from resctrl updated with data
> > > about caches) can be change on the system by remounting the /sys/fs/resctrl with
> > > different options, the capabilities need to be refreshed.  There is a better fix
> > > in the works, but it's going to be way bigger than this (hence the XXX note
> > > there), so for the time being let's workaround this.  And in order not to slow
> > > down the domain starting, only get the capabilities if there are any cachetunes.
> > > 
> > > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1540780
> > 
> > This BZ describes a crash if the filesystem is remounted, but you are
> > attempting to fix this not by fixing the code that crashed but by
> > re-loading the information if possibly somebody remounted it.
> > 
> > This does not seem to be the correct fix since you still have a race
> > window, where the options can be changed after the refresh is executed
> > and prior to using them in the code where it actually crashed.
> 
> Yeah, I'm looking at that as well.  It will need a restructuring (moving some
> conf code to util - it'll also look nicer), but fix for exactly what is
> happening here is enough for now.

Well, I'm not okay with selling this as a fix for the crash described in
the bugzilla. I might be okay with doing this as a mitigation for stale
data, but this is not a fix for the crash in any way.

We have the same kind of issue (minus the crash) with hugetlbfs mount
data or host cpu maps in the numa host description and we don't refresh
the capabilities at every start of those.

So there's the problem that the cache filesystem data is stale and your
host might fail to start which is inconvenient but not a big of a
problem.

The second issue is the crash if the data is stale and this certainly
does not fix that.

I will be okay with this patch (since it's in an unlikely code path) if
you rewrite the commit message and completely drop the mention of the
bugzilla above. This patch simply does not fix that BZ.

Attachment: signature.asc
Description: PGP signature

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list

[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux