Hi Shameer, On 5/10/19 10:34 AM, Shameerali Kolothum Thodi wrote: > > >> -----Original Message----- >> From: Laszlo Ersek [mailto:lersek@xxxxxxxxxx] >> Sent: 09 May 2019 22:48 >> To: Igor Mammedov <imammedo@xxxxxxxxxx> >> Cc: Robin Murphy <robin.murphy@xxxxxxx>; Shameerali Kolothum Thodi >> <shameerali.kolothum.thodi@xxxxxxxxxx>; will.deacon@xxxxxxx; Catalin >> Marinas <Catalin.Marinas@xxxxxxx>; Anshuman Khandual >> <anshuman.khandual@xxxxxxx>; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; >> linux-mm <linux-mm@xxxxxxxxx>; qemu-devel@xxxxxxxxxx; >> qemu-arm@xxxxxxxxxx; eric.auger@xxxxxxxxxx; peter.maydell@xxxxxxxxxx; >> Linuxarm <linuxarm@xxxxxxxxxx>; ard.biesheuvel@xxxxxxxxxx; Jonathan >> Cameron <jonathan.cameron@xxxxxxxxxx>; xuwei (O) <xuwei5@xxxxxxxxxx> >> Subject: Re: [Question] Memory hotplug clarification for Qemu ARM/virt >> >> On 05/09/19 18:35, Igor Mammedov wrote: >>> On Wed, 8 May 2019 22:26:12 +0200 >>> Laszlo Ersek <lersek@xxxxxxxxxx> wrote: >>> >>>> On 05/08/19 14:50, Robin Murphy wrote: >>>>> Hi Shameer, >>>>> >>>>> On 08/05/2019 11:15, Shameerali Kolothum Thodi wrote: >>>>>> Hi, >>>>>> >>>>>> This series here[0] attempts to add support for PCDIMM in QEMU for >>>>>> ARM/Virt platform and has stumbled upon an issue as it is not clear(at >>>>>> least >>>>>> from Qemu/EDK2 point of view) how in physical world the hotpluggable >>>>>> memory is handled by kernel. >>>>>> >>>>>> The proposed implementation in Qemu, builds the SRAT and DSDT parts >>>>>> and uses GED device to trigger the hotplug. This works fine. >>>>>> >>>>>> But when we added the DT node corresponding to the PCDIMM(cold plug >>>>>> scenario), we noticed that Guest kernel see this memory during early >> boot >>>>>> even if we are booting with ACPI. Because of this, hotpluggable memory >>>>>> may end up in zone normal and make it non-hot-un-pluggable even if >> Guest >>>>>> boots with ACPI. >>>>>> >>>>>> Further discussions[1] revealed that, EDK2 UEFI has no means to >>>>>> interpret the >>>>>> ACPI content from Qemu(this is designed to do so) and uses DT info to >>>>>> build the GetMemoryMap(). To solve this, introduced "hotpluggable" >>>>>> property >>>>>> to DT memory node(patches #7 & #8 from [0]) so that UEFI can >>>>>> differentiate >>>>>> the nodes and exclude the hotpluggable ones from GetMemoryMap(). >>>>>> >>>>>> But then Laszlo rightly pointed out that in order to accommodate the >>>>>> changes >>>>>> into UEFI we need to know how exactly Linux expects/handles all the >>>>>> hotpluggable memory scenarios. Please find the discussion here[2]. >>>>>> >>>>>> For ease, I am just copying the relevant comment from Laszlo below, >>>>>> >>>>>> /****** >>>>>> "Given patches #7 and #8, as I understand them, the firmware cannot >>>>>> distinguish >>>>>> hotpluggable & present, from hotpluggable & absent. The firmware >> can >>>>>> only >>>>>> skip both hotpluggable cases. That's fine in that the firmware will >>>>>> hog neither >>>>>> type -- but is that OK for the OS as well, for both ACPI boot and DT >>>>>> boot? >>>>>> >>>>>> Consider in particular the "hotpluggable & present, ACPI boot" case. >>>>>> Assuming >>>>>> we modify the firmware to skip "hotpluggable" altogether, the UEFI >> memmap >>>>>> will not include the range despite it being present at boot. >>>>>> Presumably, ACPI >>>>>> will refer to the range somehow, however. Will that not confuse the OS? >>>>>> >>>>>> When Igor raised this earlier, I suggested that >>>>>> hotpluggable-and-present should >>>>>> be added by the firmware, but also allocated immediately, as >>>>>> EfiBootServicesData >>>>>> type memory. This will prevent other drivers in the firmware from >>>>>> allocating AcpiNVS >>>>>> or Reserved chunks from the same memory range, the UEFI memmap will >>>>>> contain >>>>>> the range as EfiBootServicesData, and then the OS can release that >>>>>> allocation in >>>>>> one go early during boot. >>>>>> >>>>>> But this really has to be clarified from the Linux kernel's >>>>>> expectations. Please >>>>>> formalize all of the following cases: >>>>>> >>>>>> OS boot (DT/ACPI) hotpluggable & ... GetMemoryMap() should report >>>>>> as DT/ACPI should report as >>>>>> ----------------- ------------------ >>>>>> ------------------------------- ------------------------ >>>>>> >> DT present ? >> ? >>>>>> >> DT absent ? >> ? >>>>>> >> ACPI present ? >> ? >>>>>> >> ACPI absent ? >> ? >>>>>> >>>>>> Again, this table is dictated by Linux." >>>>>> >>>>>> ******/ >>>>>> >>>>>> Could you please take a look at this and let us know what is expected >>>>>> here from >>>>>> a Linux kernel view point. >>>>> >>>>> For arm64, so far we've not even been considering DT-based hotplug - as >>>>> far as I'm aware there would still be a big open question there around >>>>> notification mechanisms and how to describe them. The DT stuff so far >>>>> has come from the PowerPC folks, so it's probably worth seeing what >>>>> their ideas are. >>>>> >>>>> ACPI-wise I've always assumed/hoped that hotplug-related things should >>>>> be sufficiently well-specified in UEFI that "do whatever x86/IA-64 do" >>>>> would be enough for us. >>>> >>>> As far as I can see in UEFI v2.8 -- and I had checked the spec before >>>> dumping the table with the many question marks on Shameer --, all the >>>> hot-plug language in the spec refers to USB and PCI hot-plug in the >>>> preboot environment. There is not a single word about hot-plug at OS >>>> runtime (regarding any device or component type), nor about memory >>>> hot-plug (at any time). >>>> >>>> Looking to x86 appears valid -- so what does the Linux kernel expect on >>>> that architecture, in the "ACPI" rows of the table? >>> >>> I could only answer from QEMU x86 perspective. >>> QEMU for x86 guests currently doesn't add hot-pluggable RAM into E820 >>> because of different linux guests tend to cannibalize it, making it non >>> unpluggable. The last culprit I recall was KASLR. >>> >>> So I'd refrain from reporting hotpluggable RAM in GetMemoryMap() if >>> it's possible (it's probably hack (spec deosn't say anything about it) >>> but it mostly works for Linux (plug/unplug) and Windows guest also >>> fine with plug part (no unplug there)). >> >> I can accept this as a perfectly valid design. Which would mean, QEMU should >> mark each hotpluggable RAM range in the DTB for the firmware with the >> special new property, regardless of its initial ("cold") plugged-ness, and then >> the firmware will not expose the range in the GCD memory space map, and >> consequently in the UEFI memmap either. >> >> IOW, our table is, thus far: >> >> OS boot (DT/ACPI) hotpluggable & ... GetMemoryMap() should report as >> DT/ACPI should report as >> ----------------- ------------------ ------------------------------- ------------------------ >> DT present >> ABSENT ? >> DT absent >> ABSENT ? >> ACPI present ABSENT >> PRESENT >> ACPI absent ABSENT >> ABSENT >> In the firmware, I only need to care about the GetMemoryMap() column, so I >> can work with this. > > Thank you all for the inputs. > > I assume we will still report the DT cold plug case to kernel(hotpluggable & present). > so the table will be something like this, > > OS boot (DT/ACPI) hotpluggable & ... GetMemoryMap() should report as DT/ACPI should report as > ----------------- ------------------ ------------------------------- ------------------------ > DT present ABSENT PRESENT > DT absent ABSENT ABSENT With DT boot, how does the OS get to know if thehotpluggable memory is present or absent? Or maybe I misunderstand the last column. Thanks Eric > ACPI present ABSENT PRESENT > ACPI absent ABSENT ABSENT > > > Can someone please file a feature request at >> <https://bugzilla.tianocore.org/>, for the ArmVirtPkg Package, with these >> detais? > > Ok. I will do that. > > Thanks, > Shameer > >> Thanks >> Laszlo >> >>> >>> As for physical systems, there are out there ones that do report >>> hotpluggable RAM in GetMemoryMap(). >>> >>>> Shameer: if you (Huawei) are represented on the USWG / ASWG, I suggest >>>> re-raising the question on those lists too; at least the "ACPI" rows of >>>> the table. >>>> >>>> Thanks! >>>> Laszlo >>>> >>>>> >>>>> Robin. >>>>> >>>>>> (Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed >>>>>> any valid >>>>>> points above). >>>>>> >>>>>> Thanks, >>>>>> Shameer >>>>>> [0] https://patchwork.kernel.org/cover/10890919/ >>>>>> [1] https://patchwork.kernel.org/patch/10863299/ >>>>>> [2] https://patchwork.kernel.org/patch/10890937/ >>>>>> >>>>>> >>>> >>> >