On Tue, 16 Feb 2016 02:35:41 +0800 Xiao Guangrong <guangrong.xiao@xxxxxxxxxxxxxxx> wrote: > On 02/16/2016 01:24 AM, Igor Mammedov wrote: > > On Mon, 15 Feb 2016 23:53:13 +0800 > > Xiao Guangrong <guangrong.xiao@xxxxxxxxxxxxxxx> wrote: > > > >> On 02/15/2016 09:32 PM, Igor Mammedov wrote: > >>> On Mon, 15 Feb 2016 13:45:59 +0200 > >>> "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: > >>> > >>>> On Mon, Feb 15, 2016 at 11:47:42AM +0100, Igor Mammedov wrote: > >>>>> On Mon, 15 Feb 2016 18:13:38 +0800 > >>>>> Xiao Guangrong <guangrong.xiao@xxxxxxxxxxxxxxx> wrote: > >>>>> > >>>>>> On 02/15/2016 05:18 PM, Michael S. Tsirkin wrote: > >>>>>>> On Mon, Feb 15, 2016 at 10:11:05AM +0100, Igor Mammedov wrote: > >>>>>>>> On Sun, 14 Feb 2016 13:57:27 +0800 > >>>>>>>> Xiao Guangrong <guangrong.xiao@xxxxxxxxxxxxxxx> wrote: > >>>>>>>> > >>>>>>>>> On 02/08/2016 07:03 PM, Igor Mammedov wrote: > >>>>>>>>>> On Wed, 13 Jan 2016 02:50:05 +0800 > >>>>>>>>>> Xiao Guangrong <guangrong.xiao@xxxxxxxxxxxxxxx> wrote: > >>>>>>>>>> > >>>>>>>>>>> 32 bits IO port starting from 0x0a18 in guest is reserved for NVDIMM > >>>>>>>>>>> ACPI emulation. The table, NVDIMM_DSM_MEM_FILE, will be patched into > >>>>>>>>>>> NVDIMM ACPI binary code > >>>>>>>>>>> > >>>>>>>>>>> OSPM uses this port to tell QEMU the final address of the DSM memory > >>>>>>>>>>> and notify QEMU to emulate the DSM method > >>>>>>>>>> Would you need to pass control to QEMU if each NVDIMM had its whole > >>>>>>>>>> label area MemoryRegion mapped right after its storage MemoryRegion? > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> No, label data is not mapped into guest's address space and it only > >>>>>>>>> can be accessed by DSM method indirectly. > >>>>>>>> Yep, per spec label data should be accessed via _DSM but question > >>>>>>>> wasn't about it, > >>>>>> > >>>>>> Ah, sorry, i missed your question. > >>>>>> > >>>>>>>> Why would one map only 4Kb window and serialize label data > >>>>>>>> via it if it could be mapped as whole, that way _DMS method will be > >>>>>>>> much less complicated and there won't be need to add/support a protocol > >>>>>>>> for its serialization. > >>>>>>>> > >>>>>>> > >>>>>>> Is it ever accessed on data path? If not I prefer the current approach: > >>>>>> > >>>>>> The label data is only accessed via two DSM commands - Get Namespace Label > >>>>>> Data and Set Namespace Label Data, no other place need to be emulated. > >>>>>> > >>>>>>> limit the window used, the serialization protocol seems rather simple. > >>>>>>> > >>>>>> > >>>>>> Yes. > >>>>>> > >>>>>> Label data is at least 128k which is big enough for BIOS as it allocates > >>>>>> memory at 0 ~ 4G which is tight region. It also needs guest OS to support > >>>>>> lager max-xfer (the max size that can be transferred one time), the size > >>>>>> in current Linux NVDIMM driver is 4k. > >>>>>> > >>>>>> However, using lager DSM buffer can help us to simplify NVDIMM hotplug for > >>>>>> the case that too many nvdimm devices present in the system and their FIT > >>>>>> info can not be filled into one page. Each PMEM-only device needs 0xb8 bytes > >>>>>> and we can append 256 memory devices at most, so 12 pages are needed to > >>>>>> contain this info. The prototype we implemented is using ourself-defined > >>>>>> protocol to read piece of _FIT and concatenate them before return to Guest, > >>>>>> please refer to: > >>>>>> https://github.com/xiaogr/qemu/commit/c46ce01c8433ac0870670304360b3c4aa414143a > >>>>>> > >>>>>> As 12 pages are not small region for BIOS and the _FIT size may be extended in the > >>>>>> future development (eg, if PBLK is introduced) i am not sure if we need this. Of > >>>>>> course, another approach to simplify it is that we limit the number of NVDIMM > >>>>>> device to make sure their _FIT < 4k. > >>>>> My suggestion is not to have only one label area for every NVDIMM but > >>>>> rather to map each label area right after each NVDIMM's data memory. > >>>>> That way _DMS can be made non-serialized and guest could handle > >>>>> label data in parallel. > >>>> > >>>> I think that alignment considerations would mean we are burning up > >>>> 1G of phys address space for this. For PAE we only have 64G > >>>> of this address space, so this would be a problem. > >>> That's true that it will burning away address space, however that > >>> just means that PAE guests would not be able to handle as many > >>> NVDIMMs as 64bit guests. The same applies to DIMMs as well, with > >>> alignment enforced. If one needs more DIMMs he/she can switch > >>> to 64bit guest to use them. > >>> > >>> It's trade of inefficient GPA consumption vs efficient NVDIMMs access. > >>> Also with fully mapped label area for each NVDIMM we don't have to > >>> introduce and maintain any guest visible serialization protocol > >>> (protocol for serializing _DSM via 4K window) which becomes ABI. > >> > >> It's true for label access but it is not for the long term as we will > >> need to support other _DSM commands such as vendor specific command, > >> PBLK dsm command, also NVDIMM MCE related commands will be introduced > >> in the future, so we will come back here at that time. :( > > I believe for block mode NVDIMM would also need per NVDIMM mapping > > for performance reasons (parallel access). > > As for the rest could that commands go via MMIO that we usually > > use for control path? > > So both input data and output data go through single MMIO, we need to > introduce a protocol to pass these data, that is complex? > > And is any MMIO we can reuse (more complexer?) or we should allocate this > MMIO page (the old question - where to allocated?)? Maybe you could reuse/extend memhotplug IO interface, or alternatively as Michael suggested add a vendor specific PCI_Config, I'd suggest PM device for that (hw/acpi/[piix4.c|ihc9.c]) which I like even better since you won't need to care about which ports to allocate at all. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html