RE: [Question] Memory hotplug clarification for Qemu ARM/virt

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@xxxxxxxxxx]
> Sent: 09 May 2019 22:48
> To: Igor Mammedov <imammedo@xxxxxxxxxx>
> Cc: Robin Murphy <robin.murphy@xxxxxxx>; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@xxxxxxxxxx>; will.deacon@xxxxxxx; Catalin
> Marinas <Catalin.Marinas@xxxxxxx>; Anshuman Khandual
> <anshuman.khandual@xxxxxxx>; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx;
> linux-mm <linux-mm@xxxxxxxxx>; qemu-devel@xxxxxxxxxx;
> qemu-arm@xxxxxxxxxx; eric.auger@xxxxxxxxxx; peter.maydell@xxxxxxxxxx;
> Linuxarm <linuxarm@xxxxxxxxxx>; ard.biesheuvel@xxxxxxxxxx; Jonathan
> Cameron <jonathan.cameron@xxxxxxxxxx>; xuwei (O) <xuwei5@xxxxxxxxxx>
> Subject: Re: [Question] Memory hotplug clarification for Qemu ARM/virt
> 
> On 05/09/19 18:35, Igor Mammedov wrote:
> > On Wed, 8 May 2019 22:26:12 +0200
> > Laszlo Ersek <lersek@xxxxxxxxxx> wrote:
> >
> >> On 05/08/19 14:50, Robin Murphy wrote:
> >>> Hi Shameer,
> >>>
> >>> On 08/05/2019 11:15, Shameerali Kolothum Thodi wrote:
> >>>> Hi,
> >>>>
> >>>> This series here[0] attempts to add support for PCDIMM in QEMU for
> >>>> ARM/Virt platform and has stumbled upon an issue as it is not clear(at
> >>>> least
> >>>> from Qemu/EDK2 point of view) how in physical world the hotpluggable
> >>>> memory is handled by kernel.
> >>>>
> >>>> The proposed implementation in Qemu, builds the SRAT and DSDT parts
> >>>> and uses GED device to trigger the hotplug. This works fine.
> >>>>
> >>>> But when we added the DT node corresponding to the PCDIMM(cold plug
> >>>> scenario), we noticed that Guest kernel see this memory during early
> boot
> >>>> even if we are booting with ACPI. Because of this, hotpluggable memory
> >>>> may end up in zone normal and make it non-hot-un-pluggable even if
> Guest
> >>>> boots with ACPI.
> >>>>
> >>>> Further discussions[1] revealed that, EDK2 UEFI has no means to
> >>>> interpret the
> >>>> ACPI content from Qemu(this is designed to do so) and uses DT info to
> >>>> build the GetMemoryMap(). To solve this, introduced "hotpluggable"
> >>>> property
> >>>> to DT memory node(patches #7 & #8 from [0]) so that UEFI can
> >>>> differentiate
> >>>> the nodes and exclude the hotpluggable ones from GetMemoryMap().
> >>>>
> >>>> But then Laszlo rightly pointed out that in order to accommodate the
> >>>> changes
> >>>> into UEFI we need to know how exactly Linux expects/handles all the
> >>>> hotpluggable memory scenarios. Please find the discussion here[2].
> >>>>
> >>>> For ease, I am just copying the relevant comment from Laszlo below,
> >>>>
> >>>> /******
> >>>> "Given patches #7 and #8, as I understand them, the firmware cannot
> >>>> distinguish
> >>>>   hotpluggable & present, from hotpluggable & absent. The firmware
> can
> >>>> only
> >>>>   skip both hotpluggable cases. That's fine in that the firmware will
> >>>> hog neither
> >>>>   type -- but is that OK for the OS as well, for both ACPI boot and DT
> >>>> boot?
> >>>>
> >>>> Consider in particular the "hotpluggable & present, ACPI boot" case.
> >>>> Assuming
> >>>> we modify the firmware to skip "hotpluggable" altogether, the UEFI
> memmap
> >>>> will not include the range despite it being present at boot.
> >>>> Presumably, ACPI
> >>>> will refer to the range somehow, however. Will that not confuse the OS?
> >>>>
> >>>> When Igor raised this earlier, I suggested that
> >>>> hotpluggable-and-present should
> >>>> be added by the firmware, but also allocated immediately, as
> >>>> EfiBootServicesData
> >>>> type memory. This will prevent other drivers in the firmware from
> >>>> allocating AcpiNVS
> >>>> or Reserved chunks from the same memory range, the UEFI memmap will
> >>>> contain
> >>>> the range as EfiBootServicesData, and then the OS can release that
> >>>> allocation in
> >>>> one go early during boot.
> >>>>
> >>>> But this really has to be clarified from the Linux kernel's
> >>>> expectations. Please
> >>>> formalize all of the following cases:
> >>>>
> >>>> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report
> >>>> as  DT/ACPI should report as
> >>>> -----------------  ------------------
> >>>> -------------------------------  ------------------------
> >>>>
> DT                 present             ?
>               ?
> >>>>
> DT                 absent              ?
>                ?
> >>>>
> ACPI               present             ?
>               ?
> >>>>
> ACPI               absent              ?
>               ?
> >>>>
> >>>> Again, this table is dictated by Linux."
> >>>>
> >>>> ******/
> >>>>
> >>>> Could you please take a look at this and let us know what is expected
> >>>> here from
> >>>> a Linux kernel view point.
> >>>
> >>> For arm64, so far we've not even been considering DT-based hotplug - as
> >>> far as I'm aware there would still be a big open question there around
> >>> notification mechanisms and how to describe them. The DT stuff so far
> >>> has come from the PowerPC folks, so it's probably worth seeing what
> >>> their ideas are.
> >>>
> >>> ACPI-wise I've always assumed/hoped that hotplug-related things should
> >>> be sufficiently well-specified in UEFI that "do whatever x86/IA-64 do"
> >>> would be enough for us.
> >>
> >> As far as I can see in UEFI v2.8 -- and I had checked the spec before
> >> dumping the table with the many question marks on Shameer --, all the
> >> hot-plug language in the spec refers to USB and PCI hot-plug in the
> >> preboot environment. There is not a single word about hot-plug at OS
> >> runtime (regarding any device or component type), nor about memory
> >> hot-plug (at any time).
> >>
> >> Looking to x86 appears valid -- so what does the Linux kernel expect on
> >> that architecture, in the "ACPI" rows of the table?
> >
> > I could only answer from QEMU x86 perspective.
> > QEMU for x86 guests currently doesn't add hot-pluggable RAM into E820
> > because of different linux guests tend to cannibalize it, making it non
> > unpluggable. The last culprit I recall was KASLR.
> >
> > So I'd refrain from reporting hotpluggable RAM in GetMemoryMap() if
> > it's possible (it's probably hack (spec deosn't say anything about it)
> > but it mostly works for Linux (plug/unplug) and Windows guest also
> > fine with plug part (no unplug there)).
> 
> I can accept this as a perfectly valid design. Which would mean, QEMU should
> mark each hotpluggable RAM range in the DTB for the firmware with the
> special new property, regardless of its initial ("cold") plugged-ness, and then
> the firmware will not expose the range in the GCD memory space map, and
> consequently in the UEFI memmap either.
> 
> IOW, our table is, thus far:
> 
> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report as
> DT/ACPI should report as
> -----------------  ------------------  -------------------------------  ------------------------
> DT                 present
> ABSENT                           ?
> DT                 absent
> ABSENT                           ?
> ACPI               present             ABSENT
> PRESENT
> ACPI               absent              ABSENT
> ABSENT
> In the firmware, I only need to care about the GetMemoryMap() column, so I
> can work with this.

Thank you all for the inputs.

I assume we will still report the DT cold plug case to kernel(hotpluggable & present).
so the table will be something like this,

OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report as  DT/ACPI should report as
-----------------  ------------------  -------------------------------  ------------------------
DT                 present             ABSENT                           PRESENT
DT                 absent              ABSENT                           ABSENT
ACPI               present             ABSENT                           PRESENT
ACPI               absent              ABSENT                           ABSENT 


 Can someone please file a feature request at
> <https://bugzilla.tianocore.org/>, for the ArmVirtPkg Package, with these
> detais?

Ok. I will do that.

Thanks,
Shameer

> Thanks
> Laszlo
> 
> >
> > As for physical systems, there are out there ones that do report
> > hotpluggable RAM in GetMemoryMap().
> >
> >> Shameer: if you (Huawei) are represented on the USWG / ASWG, I suggest
> >> re-raising the question on those lists too; at least the "ACPI" rows of
> >> the table.
> >>
> >> Thanks!
> >> Laszlo
> >>
> >>>
> >>> Robin.
> >>>
> >>>> (Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed
> >>>> any valid
> >>>> points above).
> >>>>
> >>>> Thanks,
> >>>> Shameer
> >>>> [0] https://patchwork.kernel.org/cover/10890919/
> >>>> [1] https://patchwork.kernel.org/patch/10863299/
> >>>> [2] https://patchwork.kernel.org/patch/10890937/
> >>>>
> >>>>
> >>
> >





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux