Re: [PATCH 05/14] qemu: Implement NVDIMM

Michal Privoznik <mprivozn@xxxxxxxxxx> · Thu, 23 Feb 2017 14:25:13 +0100

On 02/23/2017 02:16 PM, Daniel P. Berrange wrote:
> On Thu, Feb 23, 2017 at 02:07:27PM +0100, Michal Privoznik wrote:
>> On 02/23/2017 11:48 AM, Daniel P. Berrange wrote:
>>> On Thu, Feb 23, 2017 at 11:16:17AM +0100, Michal Privoznik wrote:
>>>> On 02/23/2017 11:02 AM, Daniel P. Berrange wrote:
>>>>> On Thu, Feb 23, 2017 at 10:02:48AM +0100, Michal Privoznik wrote:
>>>>>> So, majority of the code is just ready as-is. Well, with one
>>>>>> slight change: differentiate between dimm and nvdimm in places
>>>>>> like device alias generation, generating the command line and so
>>>>>> on.
>>>>>>
>>>>>> Speaking of the command line, we also need to append 'nvdimm=on'
>>>>>> to the '-machine' argument so that the nvdimm feature is
>>>>>> advertised in the ACPI tables properly.
>>>>>>
>>>>>> Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx>
>>>>>
>>>>>
>>>>>> +        if (prealloc &&
>>>>>> +            virJSONValueObjectAdd(props,
>>>>>> +                                  "b:prealloc", true,
>>>>>> +                                  NULL) < 0)
>>>>>> +            goto cleanup;
>>>>>
>>>>> As discussed on IRC, using prealloc with QEMU causes it to memset()
>>>>> the first byte of every page of memory to 0. With NVDIMM this is
>>>>> obviously corrupting application data stored in the NVDIMM.
>>>>>
>>>>> Obviously this needs fixing in QEMU, but I would think that libvirt
>>>>> needs to block use of prealloc==true when running against existing
>>>>> broken QEMU versions, otherwise users are going to be rather upset
>>>>> to see their data corrupted on every boot
>>>>
>>>> They are, but should we work around broken QEMUs? And if so - how do we
>>>> detect whether the qemu we are dealing with is broken or already fixed
>>>> in that respect?
>>>
>>> Most likely we'll just have todo a version number check I think, since
>>> fixing this isn't going to involve any externally visible change to
>>> QEMU.
>>
>> What about backports then? If some distro decides to backport your fix
>> posted on the qemu list, this check here would prevent them from running
>> fixed qemu. Also, I don't think we want to work around qemu bugs.
> 
> If the distro backports the QEMU fix, then they would have to adjust
> their libvirt to relax the version check too. Obviously we try to
> avoid this kind of scenario, but sometimes there's no other option,
> and this looks like one of those times.

Okay, fair enough.

> 
>>>> On the other hand, we can just not set prealloc true for nvdimm. That
>>>> will have one downside though - after qemu mmaps() the file, kernel is
>>>> not forced to create a private copy of the pages.
>>>
>>> If the guest has requested prealloc, then I don't think we can ignore
>>> it for NVDIMM, because its a clear violation of the memory guarantee
>>> we're claiming to provide apps. Thus I think we've no option but to
>>> report an error if we see prealloc + broken QEMU
>>
>> Do you mean user instead of guest? Because it's user who requests
>> prealloc (well, in theory it is user; libvirt does not allow users to
>> request prealloc but rather has some rules worked in that request it
>> instead).
> 
> Sorry yes, i meant user. eg if they set
> 
>   <memoryBacking>
>     <allocation mode="immediate"/>
>   </memoryBacking>
> 
> I guess this is fairly new syntax so probably few people/apps are using
> it. IIRC, if this is not present in XML, then we just automagically
> assume  allocation=immediate if using huge pages.
> 
> So I guess we could say, if allocation=immediate is *not* in the guest
> XML, then NVDIMMs are assumed to have allocation=ondemand by default.
> 
> It'd be inconsistent behaviour, but not wrong.
> 
> That way we'd only need to raise an error if the user had explicitly
> set allocation=immediate

Frankly, I'd rather be consistent AND bug-free at the same time. Can't
we make NVDIMM work with just fixed QEMUs? At the level code, the nvdimm
capability would be set if nvdimm is found in a list of devices provided
by qemu AND if the qemu version is at least 2.8 (assuming your fix makes
it into the release).

Michal

> 
> Regards,
> Daniel
> 

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list