On 02/23/2017 02:16 PM, Daniel P. Berrange wrote: > On Thu, Feb 23, 2017 at 02:07:27PM +0100, Michal Privoznik wrote: >> On 02/23/2017 11:48 AM, Daniel P. Berrange wrote: >>> On Thu, Feb 23, 2017 at 11:16:17AM +0100, Michal Privoznik wrote: >>>> On 02/23/2017 11:02 AM, Daniel P. Berrange wrote: >>>>> On Thu, Feb 23, 2017 at 10:02:48AM +0100, Michal Privoznik wrote: >>>>>> So, majority of the code is just ready as-is. Well, with one >>>>>> slight change: differentiate between dimm and nvdimm in places >>>>>> like device alias generation, generating the command line and so >>>>>> on. >>>>>> >>>>>> Speaking of the command line, we also need to append 'nvdimm=on' >>>>>> to the '-machine' argument so that the nvdimm feature is >>>>>> advertised in the ACPI tables properly. >>>>>> >>>>>> Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> >>>>> >>>>> >>>>>> + if (prealloc && >>>>>> + virJSONValueObjectAdd(props, >>>>>> + "b:prealloc", true, >>>>>> + NULL) < 0) >>>>>> + goto cleanup; >>>>> >>>>> As discussed on IRC, using prealloc with QEMU causes it to memset() >>>>> the first byte of every page of memory to 0. With NVDIMM this is >>>>> obviously corrupting application data stored in the NVDIMM. >>>>> >>>>> Obviously this needs fixing in QEMU, but I would think that libvirt >>>>> needs to block use of prealloc==true when running against existing >>>>> broken QEMU versions, otherwise users are going to be rather upset >>>>> to see their data corrupted on every boot >>>> >>>> They are, but should we work around broken QEMUs? And if so - how do we >>>> detect whether the qemu we are dealing with is broken or already fixed >>>> in that respect? >>> >>> Most likely we'll just have todo a version number check I think, since >>> fixing this isn't going to involve any externally visible change to >>> QEMU. >> >> What about backports then? If some distro decides to backport your fix >> posted on the qemu list, this check here would prevent them from running >> fixed qemu. Also, I don't think we want to work around qemu bugs. > > If the distro backports the QEMU fix, then they would have to adjust > their libvirt to relax the version check too. Obviously we try to > avoid this kind of scenario, but sometimes there's no other option, > and this looks like one of those times. Okay, fair enough. > >>>> On the other hand, we can just not set prealloc true for nvdimm. That >>>> will have one downside though - after qemu mmaps() the file, kernel is >>>> not forced to create a private copy of the pages. >>> >>> If the guest has requested prealloc, then I don't think we can ignore >>> it for NVDIMM, because its a clear violation of the memory guarantee >>> we're claiming to provide apps. Thus I think we've no option but to >>> report an error if we see prealloc + broken QEMU >> >> Do you mean user instead of guest? Because it's user who requests >> prealloc (well, in theory it is user; libvirt does not allow users to >> request prealloc but rather has some rules worked in that request it >> instead). > > Sorry yes, i meant user. eg if they set > > <memoryBacking> > <allocation mode="immediate"/> > </memoryBacking> > > I guess this is fairly new syntax so probably few people/apps are using > it. IIRC, if this is not present in XML, then we just automagically > assume allocation=immediate if using huge pages. > > So I guess we could say, if allocation=immediate is *not* in the guest > XML, then NVDIMMs are assumed to have allocation=ondemand by default. > > It'd be inconsistent behaviour, but not wrong. > > That way we'd only need to raise an error if the user had explicitly > set allocation=immediate Frankly, I'd rather be consistent AND bug-free at the same time. Can't we make NVDIMM work with just fixed QEMUs? At the level code, the nvdimm capability would be set if nvdimm is found in a list of devices provided by qemu AND if the qemu version is at least 2.8 (assuming your fix makes it into the release). Michal > > Regards, > Daniel > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list