On Wed, 8 May 2019 22:26:12 +0200 Laszlo Ersek <lersek@xxxxxxxxxx> wrote: > On 05/08/19 14:50, Robin Murphy wrote: > > Hi Shameer, > > > > On 08/05/2019 11:15, Shameerali Kolothum Thodi wrote: > >> Hi, > >> > >> This series here[0] attempts to add support for PCDIMM in QEMU for > >> ARM/Virt platform and has stumbled upon an issue as it is not clear(at > >> least > >> from Qemu/EDK2 point of view) how in physical world the hotpluggable > >> memory is handled by kernel. > >> > >> The proposed implementation in Qemu, builds the SRAT and DSDT parts > >> and uses GED device to trigger the hotplug. This works fine. > >> > >> But when we added the DT node corresponding to the PCDIMM(cold plug > >> scenario), we noticed that Guest kernel see this memory during early boot > >> even if we are booting with ACPI. Because of this, hotpluggable memory > >> may end up in zone normal and make it non-hot-un-pluggable even if Guest > >> boots with ACPI. > >> > >> Further discussions[1] revealed that, EDK2 UEFI has no means to > >> interpret the > >> ACPI content from Qemu(this is designed to do so) and uses DT info to > >> build the GetMemoryMap(). To solve this, introduced "hotpluggable" > >> property > >> to DT memory node(patches #7 & #8 from [0]) so that UEFI can > >> differentiate > >> the nodes and exclude the hotpluggable ones from GetMemoryMap(). > >> > >> But then Laszlo rightly pointed out that in order to accommodate the > >> changes > >> into UEFI we need to know how exactly Linux expects/handles all the > >> hotpluggable memory scenarios. Please find the discussion here[2]. > >> > >> For ease, I am just copying the relevant comment from Laszlo below, > >> > >> /****** > >> "Given patches #7 and #8, as I understand them, the firmware cannot > >> distinguish > >> hotpluggable & present, from hotpluggable & absent. The firmware can > >> only > >> skip both hotpluggable cases. That's fine in that the firmware will > >> hog neither > >> type -- but is that OK for the OS as well, for both ACPI boot and DT > >> boot? > >> > >> Consider in particular the "hotpluggable & present, ACPI boot" case. > >> Assuming > >> we modify the firmware to skip "hotpluggable" altogether, the UEFI memmap > >> will not include the range despite it being present at boot. > >> Presumably, ACPI > >> will refer to the range somehow, however. Will that not confuse the OS? > >> > >> When Igor raised this earlier, I suggested that > >> hotpluggable-and-present should > >> be added by the firmware, but also allocated immediately, as > >> EfiBootServicesData > >> type memory. This will prevent other drivers in the firmware from > >> allocating AcpiNVS > >> or Reserved chunks from the same memory range, the UEFI memmap will > >> contain > >> the range as EfiBootServicesData, and then the OS can release that > >> allocation in > >> one go early during boot. > >> > >> But this really has to be clarified from the Linux kernel's > >> expectations. Please > >> formalize all of the following cases: > >> > >> OS boot (DT/ACPI) hotpluggable & ... GetMemoryMap() should report > >> as DT/ACPI should report as > >> ----------------- ------------------ > >> ------------------------------- ------------------------ > >> DT present ? ? > >> DT absent ? ? > >> ACPI present ? ? > >> ACPI absent ? ? > >> > >> Again, this table is dictated by Linux." > >> > >> ******/ > >> > >> Could you please take a look at this and let us know what is expected > >> here from > >> a Linux kernel view point. > > > > For arm64, so far we've not even been considering DT-based hotplug - as > > far as I'm aware there would still be a big open question there around > > notification mechanisms and how to describe them. The DT stuff so far > > has come from the PowerPC folks, so it's probably worth seeing what > > their ideas are. > > > > ACPI-wise I've always assumed/hoped that hotplug-related things should > > be sufficiently well-specified in UEFI that "do whatever x86/IA-64 do" > > would be enough for us. > > As far as I can see in UEFI v2.8 -- and I had checked the spec before > dumping the table with the many question marks on Shameer --, all the > hot-plug language in the spec refers to USB and PCI hot-plug in the > preboot environment. There is not a single word about hot-plug at OS > runtime (regarding any device or component type), nor about memory > hot-plug (at any time). > > Looking to x86 appears valid -- so what does the Linux kernel expect on > that architecture, in the "ACPI" rows of the table? I could only answer from QEMU x86 perspective. QEMU for x86 guests currently doesn't add hot-pluggable RAM into E820 because of different linux guests tend to cannibalize it, making it non unpluggable. The last culprit I recall was KASLR. So I'd refrain from reporting hotpluggable RAM in GetMemoryMap() if it's possible (it's probably hack (spec deosn't say anything about it) but it mostly works for Linux (plug/unplug) and Windows guest also fine with plug part (no unplug there)). As for physical systems, there are out there ones that do report hotpluggable RAM in GetMemoryMap(). > Shameer: if you (Huawei) are represented on the USWG / ASWG, I suggest > re-raising the question on those lists too; at least the "ACPI" rows of > the table. > > Thanks! > Laszlo > > > > > Robin. > > > >> (Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed > >> any valid > >> points above). > >> > >> Thanks, > >> Shameer > >> [0] https://patchwork.kernel.org/cover/10890919/ > >> [1] https://patchwork.kernel.org/patch/10863299/ > >> [2] https://patchwork.kernel.org/patch/10890937/ > >> > >> >