> -----Original Message----- > From: Laszlo Ersek [mailto:lersek@xxxxxxxxxx] > Sent: 09 May 2019 22:48 > To: Igor Mammedov <imammedo@xxxxxxxxxx> > Cc: Robin Murphy <robin.murphy@xxxxxxx>; Shameerali Kolothum Thodi > <shameerali.kolothum.thodi@xxxxxxxxxx>; will.deacon@xxxxxxx; Catalin > Marinas <Catalin.Marinas@xxxxxxx>; Anshuman Khandual > <anshuman.khandual@xxxxxxx>; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; > linux-mm <linux-mm@xxxxxxxxx>; qemu-devel@xxxxxxxxxx; > qemu-arm@xxxxxxxxxx; eric.auger@xxxxxxxxxx; peter.maydell@xxxxxxxxxx; > Linuxarm <linuxarm@xxxxxxxxxx>; ard.biesheuvel@xxxxxxxxxx; Jonathan > Cameron <jonathan.cameron@xxxxxxxxxx>; xuwei (O) <xuwei5@xxxxxxxxxx> > Subject: Re: [Question] Memory hotplug clarification for Qemu ARM/virt > > On 05/09/19 18:35, Igor Mammedov wrote: > > On Wed, 8 May 2019 22:26:12 +0200 > > Laszlo Ersek <lersek@xxxxxxxxxx> wrote: > > > >> On 05/08/19 14:50, Robin Murphy wrote: > >>> Hi Shameer, > >>> > >>> On 08/05/2019 11:15, Shameerali Kolothum Thodi wrote: > >>>> Hi, > >>>> > >>>> This series here[0] attempts to add support for PCDIMM in QEMU for > >>>> ARM/Virt platform and has stumbled upon an issue as it is not clear(at > >>>> least > >>>> from Qemu/EDK2 point of view) how in physical world the hotpluggable > >>>> memory is handled by kernel. > >>>> > >>>> The proposed implementation in Qemu, builds the SRAT and DSDT parts > >>>> and uses GED device to trigger the hotplug. This works fine. > >>>> > >>>> But when we added the DT node corresponding to the PCDIMM(cold plug > >>>> scenario), we noticed that Guest kernel see this memory during early > boot > >>>> even if we are booting with ACPI. Because of this, hotpluggable memory > >>>> may end up in zone normal and make it non-hot-un-pluggable even if > Guest > >>>> boots with ACPI. > >>>> > >>>> Further discussions[1] revealed that, EDK2 UEFI has no means to > >>>> interpret the > >>>> ACPI content from Qemu(this is designed to do so) and uses DT info to > >>>> build the GetMemoryMap(). To solve this, introduced "hotpluggable" > >>>> property > >>>> to DT memory node(patches #7 & #8 from [0]) so that UEFI can > >>>> differentiate > >>>> the nodes and exclude the hotpluggable ones from GetMemoryMap(). > >>>> > >>>> But then Laszlo rightly pointed out that in order to accommodate the > >>>> changes > >>>> into UEFI we need to know how exactly Linux expects/handles all the > >>>> hotpluggable memory scenarios. Please find the discussion here[2]. > >>>> > >>>> For ease, I am just copying the relevant comment from Laszlo below, > >>>> > >>>> /****** > >>>> "Given patches #7 and #8, as I understand them, the firmware cannot > >>>> distinguish > >>>> hotpluggable & present, from hotpluggable & absent. The firmware > can > >>>> only > >>>> skip both hotpluggable cases. That's fine in that the firmware will > >>>> hog neither > >>>> type -- but is that OK for the OS as well, for both ACPI boot and DT > >>>> boot? > >>>> > >>>> Consider in particular the "hotpluggable & present, ACPI boot" case. > >>>> Assuming > >>>> we modify the firmware to skip "hotpluggable" altogether, the UEFI > memmap > >>>> will not include the range despite it being present at boot. > >>>> Presumably, ACPI > >>>> will refer to the range somehow, however. Will that not confuse the OS? > >>>> > >>>> When Igor raised this earlier, I suggested that > >>>> hotpluggable-and-present should > >>>> be added by the firmware, but also allocated immediately, as > >>>> EfiBootServicesData > >>>> type memory. This will prevent other drivers in the firmware from > >>>> allocating AcpiNVS > >>>> or Reserved chunks from the same memory range, the UEFI memmap will > >>>> contain > >>>> the range as EfiBootServicesData, and then the OS can release that > >>>> allocation in > >>>> one go early during boot. > >>>> > >>>> But this really has to be clarified from the Linux kernel's > >>>> expectations. Please > >>>> formalize all of the following cases: > >>>> > >>>> OS boot (DT/ACPI) hotpluggable & ... GetMemoryMap() should report > >>>> as DT/ACPI should report as > >>>> ----------------- ------------------ > >>>> ------------------------------- ------------------------ > >>>> > DT present ? > ? > >>>> > DT absent ? > ? > >>>> > ACPI present ? > ? > >>>> > ACPI absent ? > ? > >>>> > >>>> Again, this table is dictated by Linux." > >>>> > >>>> ******/ > >>>> > >>>> Could you please take a look at this and let us know what is expected > >>>> here from > >>>> a Linux kernel view point. > >>> > >>> For arm64, so far we've not even been considering DT-based hotplug - as > >>> far as I'm aware there would still be a big open question there around > >>> notification mechanisms and how to describe them. The DT stuff so far > >>> has come from the PowerPC folks, so it's probably worth seeing what > >>> their ideas are. > >>> > >>> ACPI-wise I've always assumed/hoped that hotplug-related things should > >>> be sufficiently well-specified in UEFI that "do whatever x86/IA-64 do" > >>> would be enough for us. > >> > >> As far as I can see in UEFI v2.8 -- and I had checked the spec before > >> dumping the table with the many question marks on Shameer --, all the > >> hot-plug language in the spec refers to USB and PCI hot-plug in the > >> preboot environment. There is not a single word about hot-plug at OS > >> runtime (regarding any device or component type), nor about memory > >> hot-plug (at any time). > >> > >> Looking to x86 appears valid -- so what does the Linux kernel expect on > >> that architecture, in the "ACPI" rows of the table? > > > > I could only answer from QEMU x86 perspective. > > QEMU for x86 guests currently doesn't add hot-pluggable RAM into E820 > > because of different linux guests tend to cannibalize it, making it non > > unpluggable. The last culprit I recall was KASLR. > > > > So I'd refrain from reporting hotpluggable RAM in GetMemoryMap() if > > it's possible (it's probably hack (spec deosn't say anything about it) > > but it mostly works for Linux (plug/unplug) and Windows guest also > > fine with plug part (no unplug there)). > > I can accept this as a perfectly valid design. Which would mean, QEMU should > mark each hotpluggable RAM range in the DTB for the firmware with the > special new property, regardless of its initial ("cold") plugged-ness, and then > the firmware will not expose the range in the GCD memory space map, and > consequently in the UEFI memmap either. > > IOW, our table is, thus far: > > OS boot (DT/ACPI) hotpluggable & ... GetMemoryMap() should report as > DT/ACPI should report as > ----------------- ------------------ ------------------------------- ------------------------ > DT present > ABSENT ? > DT absent > ABSENT ? > ACPI present ABSENT > PRESENT > ACPI absent ABSENT > ABSENT > In the firmware, I only need to care about the GetMemoryMap() column, so I > can work with this. Thank you all for the inputs. I assume we will still report the DT cold plug case to kernel(hotpluggable & present). so the table will be something like this, OS boot (DT/ACPI) hotpluggable & ... GetMemoryMap() should report as DT/ACPI should report as ----------------- ------------------ ------------------------------- ------------------------ DT present ABSENT PRESENT DT absent ABSENT ABSENT ACPI present ABSENT PRESENT ACPI absent ABSENT ABSENT Can someone please file a feature request at > <https://bugzilla.tianocore.org/>, for the ArmVirtPkg Package, with these > detais? Ok. I will do that. Thanks, Shameer > Thanks > Laszlo > > > > > As for physical systems, there are out there ones that do report > > hotpluggable RAM in GetMemoryMap(). > > > >> Shameer: if you (Huawei) are represented on the USWG / ASWG, I suggest > >> re-raising the question on those lists too; at least the "ACPI" rows of > >> the table. > >> > >> Thanks! > >> Laszlo > >> > >>> > >>> Robin. > >>> > >>>> (Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed > >>>> any valid > >>>> points above). > >>>> > >>>> Thanks, > >>>> Shameer > >>>> [0] https://patchwork.kernel.org/cover/10890919/ > >>>> [1] https://patchwork.kernel.org/patch/10863299/ > >>>> [2] https://patchwork.kernel.org/patch/10890937/ > >>>> > >>>> > >> > >