On 05/09/19 18:35, Igor Mammedov wrote: > On Wed, 8 May 2019 22:26:12 +0200 > Laszlo Ersek <lersek@xxxxxxxxxx> wrote: > >> On 05/08/19 14:50, Robin Murphy wrote: >>> Hi Shameer, >>> >>> On 08/05/2019 11:15, Shameerali Kolothum Thodi wrote: >>>> Hi, >>>> >>>> This series here[0] attempts to add support for PCDIMM in QEMU for >>>> ARM/Virt platform and has stumbled upon an issue as it is not clear(at >>>> least >>>> from Qemu/EDK2 point of view) how in physical world the hotpluggable >>>> memory is handled by kernel. >>>> >>>> The proposed implementation in Qemu, builds the SRAT and DSDT parts >>>> and uses GED device to trigger the hotplug. This works fine. >>>> >>>> But when we added the DT node corresponding to the PCDIMM(cold plug >>>> scenario), we noticed that Guest kernel see this memory during early boot >>>> even if we are booting with ACPI. Because of this, hotpluggable memory >>>> may end up in zone normal and make it non-hot-un-pluggable even if Guest >>>> boots with ACPI. >>>> >>>> Further discussions[1] revealed that, EDK2 UEFI has no means to >>>> interpret the >>>> ACPI content from Qemu(this is designed to do so) and uses DT info to >>>> build the GetMemoryMap(). To solve this, introduced "hotpluggable" >>>> property >>>> to DT memory node(patches #7 & #8 from [0]) so that UEFI can >>>> differentiate >>>> the nodes and exclude the hotpluggable ones from GetMemoryMap(). >>>> >>>> But then Laszlo rightly pointed out that in order to accommodate the >>>> changes >>>> into UEFI we need to know how exactly Linux expects/handles all the >>>> hotpluggable memory scenarios. Please find the discussion here[2]. >>>> >>>> For ease, I am just copying the relevant comment from Laszlo below, >>>> >>>> /****** >>>> "Given patches #7 and #8, as I understand them, the firmware cannot >>>> distinguish >>>> hotpluggable & present, from hotpluggable & absent. The firmware can >>>> only >>>> skip both hotpluggable cases. That's fine in that the firmware will >>>> hog neither >>>> type -- but is that OK for the OS as well, for both ACPI boot and DT >>>> boot? >>>> >>>> Consider in particular the "hotpluggable & present, ACPI boot" case. >>>> Assuming >>>> we modify the firmware to skip "hotpluggable" altogether, the UEFI memmap >>>> will not include the range despite it being present at boot. >>>> Presumably, ACPI >>>> will refer to the range somehow, however. Will that not confuse the OS? >>>> >>>> When Igor raised this earlier, I suggested that >>>> hotpluggable-and-present should >>>> be added by the firmware, but also allocated immediately, as >>>> EfiBootServicesData >>>> type memory. This will prevent other drivers in the firmware from >>>> allocating AcpiNVS >>>> or Reserved chunks from the same memory range, the UEFI memmap will >>>> contain >>>> the range as EfiBootServicesData, and then the OS can release that >>>> allocation in >>>> one go early during boot. >>>> >>>> But this really has to be clarified from the Linux kernel's >>>> expectations. Please >>>> formalize all of the following cases: >>>> >>>> OS boot (DT/ACPI) hotpluggable & ... GetMemoryMap() should report >>>> as DT/ACPI should report as >>>> ----------------- ------------------ >>>> ------------------------------- ------------------------ >>>> DT present ? ? >>>> DT absent ? ? >>>> ACPI present ? ? >>>> ACPI absent ? ? >>>> >>>> Again, this table is dictated by Linux." >>>> >>>> ******/ >>>> >>>> Could you please take a look at this and let us know what is expected >>>> here from >>>> a Linux kernel view point. >>> >>> For arm64, so far we've not even been considering DT-based hotplug - as >>> far as I'm aware there would still be a big open question there around >>> notification mechanisms and how to describe them. The DT stuff so far >>> has come from the PowerPC folks, so it's probably worth seeing what >>> their ideas are. >>> >>> ACPI-wise I've always assumed/hoped that hotplug-related things should >>> be sufficiently well-specified in UEFI that "do whatever x86/IA-64 do" >>> would be enough for us. >> >> As far as I can see in UEFI v2.8 -- and I had checked the spec before >> dumping the table with the many question marks on Shameer --, all the >> hot-plug language in the spec refers to USB and PCI hot-plug in the >> preboot environment. There is not a single word about hot-plug at OS >> runtime (regarding any device or component type), nor about memory >> hot-plug (at any time). >> >> Looking to x86 appears valid -- so what does the Linux kernel expect on >> that architecture, in the "ACPI" rows of the table? > > I could only answer from QEMU x86 perspective. > QEMU for x86 guests currently doesn't add hot-pluggable RAM into E820 > because of different linux guests tend to cannibalize it, making it non > unpluggable. The last culprit I recall was KASLR. > > So I'd refrain from reporting hotpluggable RAM in GetMemoryMap() if > it's possible (it's probably hack (spec deosn't say anything about it) > but it mostly works for Linux (plug/unplug) and Windows guest also > fine with plug part (no unplug there)). I can accept this as a perfectly valid design. Which would mean, QEMU should mark each hotpluggable RAM range in the DTB for the firmware with the special new property, regardless of its initial ("cold") plugged-ness, and then the firmware will not expose the range in the GCD memory space map, and consequently in the UEFI memmap either. IOW, our table is, thus far: OS boot (DT/ACPI) hotpluggable & ... GetMemoryMap() should report as DT/ACPI should report as ----------------- ------------------ ------------------------------- ------------------------ DT present ABSENT ? DT absent ABSENT ? ACPI present ABSENT PRESENT ACPI absent ABSENT ABSENT In the firmware, I only need to care about the GetMemoryMap() column, so I can work with this. Can someone please file a feature request at <https://bugzilla.tianocore.org/>, for the ArmVirtPkg Package, with these detais? Thanks Laszlo > > As for physical systems, there are out there ones that do report > hotpluggable RAM in GetMemoryMap(). > >> Shameer: if you (Huawei) are represented on the USWG / ASWG, I suggest >> re-raising the question on those lists too; at least the "ACPI" rows of >> the table. >> >> Thanks! >> Laszlo >> >>> >>> Robin. >>> >>>> (Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed >>>> any valid >>>> points above). >>>> >>>> Thanks, >>>> Shameer >>>> [0] https://patchwork.kernel.org/cover/10890919/ >>>> [1] https://patchwork.kernel.org/patch/10863299/ >>>> [2] https://patchwork.kernel.org/patch/10890937/ >>>> >>>> >> >