On Thu, Feb 23, 2017 at 02:25:13PM +0100, Michal Privoznik wrote: > On 02/23/2017 02:16 PM, Daniel P. Berrange wrote: > > On Thu, Feb 23, 2017 at 02:07:27PM +0100, Michal Privoznik wrote: > >> On 02/23/2017 11:48 AM, Daniel P. Berrange wrote: > >>> On Thu, Feb 23, 2017 at 11:16:17AM +0100, Michal Privoznik wrote: > >>>> On 02/23/2017 11:02 AM, Daniel P. Berrange wrote: > >>>>> On Thu, Feb 23, 2017 at 10:02:48AM +0100, Michal Privoznik wrote: > >>>>>> So, majority of the code is just ready as-is. Well, with one > >>>>>> slight change: differentiate between dimm and nvdimm in places > >>>>>> like device alias generation, generating the command line and so > >>>>>> on. > >>>>>> > >>>>>> Speaking of the command line, we also need to append 'nvdimm=on' > >>>>>> to the '-machine' argument so that the nvdimm feature is > >>>>>> advertised in the ACPI tables properly. > >>>>>> > >>>>>> Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> > >>>>> > >>>>> > >>>>>> + if (prealloc && > >>>>>> + virJSONValueObjectAdd(props, > >>>>>> + "b:prealloc", true, > >>>>>> + NULL) < 0) > >>>>>> + goto cleanup; > >>>>> > >>>>> As discussed on IRC, using prealloc with QEMU causes it to memset() > >>>>> the first byte of every page of memory to 0. With NVDIMM this is > >>>>> obviously corrupting application data stored in the NVDIMM. > >>>>> > >>>>> Obviously this needs fixing in QEMU, but I would think that libvirt > >>>>> needs to block use of prealloc==true when running against existing > >>>>> broken QEMU versions, otherwise users are going to be rather upset > >>>>> to see their data corrupted on every boot > >>>> > >>>> They are, but should we work around broken QEMUs? And if so - how do we > >>>> detect whether the qemu we are dealing with is broken or already fixed > >>>> in that respect? > >>> > >>> Most likely we'll just have todo a version number check I think, since > >>> fixing this isn't going to involve any externally visible change to > >>> QEMU. > >> > >> What about backports then? If some distro decides to backport your fix > >> posted on the qemu list, this check here would prevent them from running > >> fixed qemu. Also, I don't think we want to work around qemu bugs. > > > > If the distro backports the QEMU fix, then they would have to adjust > > their libvirt to relax the version check too. Obviously we try to > > avoid this kind of scenario, but sometimes there's no other option, > > and this looks like one of those times. > > Okay, fair enough. > > > > >>>> On the other hand, we can just not set prealloc true for nvdimm. That > >>>> will have one downside though - after qemu mmaps() the file, kernel is > >>>> not forced to create a private copy of the pages. > >>> > >>> If the guest has requested prealloc, then I don't think we can ignore > >>> it for NVDIMM, because its a clear violation of the memory guarantee > >>> we're claiming to provide apps. Thus I think we've no option but to > >>> report an error if we see prealloc + broken QEMU > >> > >> Do you mean user instead of guest? Because it's user who requests > >> prealloc (well, in theory it is user; libvirt does not allow users to > >> request prealloc but rather has some rules worked in that request it > >> instead). > > > > Sorry yes, i meant user. eg if they set > > > > <memoryBacking> > > <allocation mode="immediate"/> > > </memoryBacking> > > > > I guess this is fairly new syntax so probably few people/apps are using > > it. IIRC, if this is not present in XML, then we just automagically > > assume allocation=immediate if using huge pages. > > > > So I guess we could say, if allocation=immediate is *not* in the guest > > XML, then NVDIMMs are assumed to have allocation=ondemand by default. > > > > It'd be inconsistent behaviour, but not wrong. > > > > That way we'd only need to raise an error if the user had explicitly > > set allocation=immediate > > Frankly, I'd rather be consistent AND bug-free at the same time. Can't > we make NVDIMM work with just fixed QEMUs? At the level code, the nvdimm > capability would be set if nvdimm is found in a list of devices provided > by qemu AND if the qemu version is at least 2.8 (assuming your fix makes > it into the release). Yep that's a valid approach to take assuming QEMU get a fix done in a timely manner, so that people aren't pressuring us to support the buggy QEMU. Lets aim todo that until someone complains :-) Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list