>>> On 28.11.17 at 11:17, <christian.koenig@xxxxxxx> wrote: > Am 28.11.2017 um 10:46 schrieb Jan Beulich: >>>>> On 28.11.17 at 10:12, <christian.koenig@xxxxxxx> wrote: >>> In theory the BIOS would search for address space and won't find >>> anything, so the hotplug operation should fail even before it reaches >>> the kernel in the first place. >> How would the BIOS know what the OS does or plans to do? > > As far as I know the ACPI BIOS should work directly with the register > content. > > So when we update the register content to enable the MMIO decoding the > BIOS should know that as well. I'm afraid I don't follow: During memory hotplug, surely you don't expect the BIOS to do a PCI bus scan? Plus even if it did, it would be racy - some device could, at this very moment, have memory decoding disabled, just for the OS to re-enable it a millisecond later. Yet looking at BAR values is meaningless when memory decode of a device is disabled. >> I think >> it's the other way around - the OS needs to avoid using any regions >> for MMIO which are marked as hotpluggable in SRAT. > > I was under the impression that this is exactly what > acpi_numa_memory_affinity_init() does. Perhaps, except that (when I last looked) insufficient state is (was) being recorded to have that information readily available at the time MMIO space above 4Gb needs to be allocated for some device. >> Since there is >> no vNUMA yet for Xen Dom0, that would need special handling. > > I think that the problem is rather that SRAT is NUMA specific and if I'm > not totally mistaken the content is ignored when NUMA support isn't > compiled into the kernel. > > When Xen steals some memory from Dom0 by hocking up itself into the e820 > call then I would say the cleanest way is to report this memory in e820 > as reserved as well. But take that with a grain of salt, I'm seriously > not a Xen expert. The E820 handling in PV Linux is all fake anyway - there's a single chunk of memory given to a PV guest (including Dom0), contiguous in what PV guests know as "physical address space" (not to be mixed up with "machine address space", which is where MMIO needs to be allocated from). Xen code in the kernel then mimics an E820 matching the host one, moving around pieces of memory in physical address space if necessary. Since Dom0 knows the machine E820, MMIO allocation shouldn't need to be much different there from the non-Xen case. Jan