On 31.10.20 17:51, Dan Williams wrote:
On Sat, Oct 31, 2020 at 3:21 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
On 30.10.20 21:37, Dan Williams wrote:
On Wed, Oct 28, 2020 at 4:06 PM Vikram Sethi <vsethi@xxxxxxxxxx> wrote:
Hello,
I wanted to kick off a discussion on how Linux onlining of CXL [1] type 2 device
Coherent memory aka Host managed device memory (HDM) will work for type 2 CXL
devices which are available/plugged in at boot. A type 2 CXL device can be simply
thought of as an accelerator with coherent device memory, that also has a
CXL.cache to cache system memory.
One could envision that BIOS/UEFI could expose the HDM in EFI memory map
as conventional memory as well as in ACPI SRAT/SLIT/HMAT. However, at least
on some architectures (arm64) EFI conventional memory available at kernel boot
memory cannot be offlined, so this may not be suitable on all architectures.
That seems an odd restriction. Add David, linux-mm, and linux-acpi as
they might be interested / have comments on this restriction as well.
I am missing some important details.
a) What happens after offlining? Will the memory be remove_memory()'ed?
Will the device get physically unplugged?
b) What's the general purpose of the memory and its intended usage when
*not* exposed as system RAM? What's the main point of treating it like
ordinary system RAM as default?
Also, can you be sure that you can offline that memory? If it's
ZONE_NORMAL (as usually all system RAM in the initial map), there are no
such guarantees, especially once the system ran for long enough, but
also in other cases (e.g., shuffling), or if allocation policies change
in the future.
So I *guess* you would already have to use kernel cmdline hacks like
"movablecore" to make it work. In that case, you can directly specify
what you *actually* want (which I am not sure yet I completely
understood) - e.g., something like "memmap=16G!16G" ... or something
similar.
I consider offlining+removing *boot* memory to not physically unplug it
(e.g., a DIMM getting unplugged) abusing the memory hotunplug
infrastructure. It's a different thing when manually adding memory like
dax_kmem does via add_memory_driver_managed().
Now, back to your original question: arm64 does not support physically
unplugging DIMMs that were part of the initial map. If you'd reboot
after unplugging a DIMM, your system would crash. We achieve that by
disallowing to offline boot memory - we could also try to handle it in
ACPI code. But again, most uses of offlining+removing boot memory are
abusing the memory hotunplug infrastructure and should rather be solved
cleanly via a different mechanism (firmware, kernel cmdline, ...).
Just recently discussed in
https://lkml.kernel.org/r/de8388df2fbc5a6a33aab95831ba7db4@xxxxxxxxxxxxxx
Further, the device driver associated with the type 2 device/accelerator may
want to save off a chunk of HDM for driver private use.
So it seems the more appropriate model may be something like dev dax model
where the device driver probe/open calls add_memory_driver_managed, and
the driver could choose how much of the HDM it wants to reserve and how
much to make generally available for application mmap/malloc.
Sure, it can always be driver managed. The trick will be getting the
platform firmware to agree to not map it by default, but I suspect
you'll have a hard time convincing platform-firmware to take that
stance. The BIOS does not know, and should not care what OS is booting
when it produces the memory map. So I think CXL memory unplug after
the fact is more realistic than trying to get the BIOS not to map it.
So, to me it looks like arm64 needs to reconsider its unplug stance.
My personal opinion is, if memory isn't just "ordinary system RAM", then
let the system know early that memory is special (as we do with
soft-reserved).
Ideally, you could configure the firmware (e.g., via BIOS setup) on what
to do, that's the cleanest solution, but I can understand that's rather
hard to achieve.
Yes, my hope, which is about the most influence I can have on
platform-firmware implementations, is that it marks CXL attached
memory as soft-reserved by default and allow OS policy decide where it
goes. Barring that, for the configuration that Vikram mentioned, the
only other way to get this differentiated / not-ordinary system-ram
back to being driver managed would be to unplug it. The soft-reserved
path is cleaner.
If we already need kernel cmdline parameters (movablecore), we can
handle this differently via the cmdline. That sets expectations for
people implementing the firmware - we shouldn't make their life too easy
with such decisions.
The paragraph started with
"One could envision that BIOS/UEFI could expose the HDM in EFI memory
map ..." Let's not envision it, but instead suggest people to not do it ;)
--
Thanks,
David / dhildenb