RE: Onlining CXL Type2 device coherent memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi David,
> From: David Hildenbrand <david@xxxxxxxxxx>
> On 31.10.20 17:51, Dan Williams wrote:
> > On Sat, Oct 31, 2020 at 3:21 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
> >>
> >> On 30.10.20 21:37, Dan Williams wrote:
> >>> On Wed, Oct 28, 2020 at 4:06 PM Vikram Sethi <vsethi@xxxxxxxxxx> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> I wanted to kick off a discussion on how Linux onlining of CXL [1] type 2
> device
> >>>> Coherent memory aka Host managed device memory (HDM) will work for
> type 2 CXL
> >>>> devices which are available/plugged in at boot. A type 2 CXL device can be
> simply
> >>>> thought of as an accelerator with coherent device memory, that also has a
> >>>> CXL.cache to cache system memory.
> >>>>
> >>>> One could envision that BIOS/UEFI could expose the HDM in EFI memory map
> >>>> as conventional memory as well as in ACPI SRAT/SLIT/HMAT. However, at
> least
> >>>> on some architectures (arm64) EFI conventional memory available at kernel
> boot
> >>>> memory cannot be offlined, so this may not be suitable on all architectures.
> >>>
> >>> That seems an odd restriction. Add David, linux-mm, and linux-acpi as
> >>> they might be interested / have comments on this restriction as well.
> >>>
> >>
> >> I am missing some important details.
> >>
> >> a) What happens after offlining? Will the memory be remove_memory()'ed?
> >> Will the device get physically unplugged?
> >>
Not always IMO. If the device was getting reset, the HDM memory is going to be
unavailable while device is reset. Offlining the memory around the reset would
be sufficient, but depending if driver had done the add_memory in probe, 
it perhaps would be onerous to have to remove_memory as well before reset, 
and then add it back after reset. I realize you’re saying such a procedure
would be abusing hotplug framework, and we could perhaps require that memory
be removed prior to reset, but not clear to me that it *must* be removed for 
correctness. 

Another usecase of offlining without removing HDM could be around 
Virtualization/passing entire device with its memory to a VM. If device was
being used in the host kernel, and is then unbound, and bound to vfio-pci
(vfio-cxl?), would we expect vfio-pci to add_memory_driver_managed?
IMO the coherent device memory should be onlined in the host, for example, to 
handle memory_failure flows and passing on to userspace/the VM when poison is 
consumed by the VM on load to "bad" HDM. I realize it *could* be done with 
vfio adding+onlining the memory to host kernel, and perhaps makes sense if 
the device had never been used in the host kernel/bound to its "native" 
device driver to begin with. Alex?

> >> b) What's the general purpose of the memory and its intended usage when
> >> *not* exposed as system RAM? What's the main point of treating it like
> >> ordinary system RAM as default?
> >>
> >> Also, can you be sure that you can offline that memory? If it's
> >> ZONE_NORMAL (as usually all system RAM in the initial map), there are no
> >> such guarantees, especially once the system ran for long enough, but
> >> also in other cases (e.g., shuffling), or if allocation policies change
> >> in the future.
> >>
> >> So I *guess* you would already have to use kernel cmdline hacks like
> >> "movablecore" to make it work. In that case, you can directly specify
> >> what you *actually* want (which I am not sure yet I completely
> >> understood) - e.g., something like "memmap=16G!16G" ... or something
> >> similar.
> >>
> >> I consider offlining+removing *boot* memory to not physically unplug it
> >> (e.g., a DIMM getting unplugged) abusing the memory hotunplug
> >> infrastructure. It's a different thing when manually adding memory like
> >> dax_kmem does via add_memory_driver_managed().
> >>
> >>
> >> Now, back to your original question: arm64 does not support physically
> >> unplugging DIMMs that were part of the initial map. If you'd reboot
> >> after unplugging a DIMM, your system would crash. We achieve that by
> >> disallowing to offline boot memory - we could also try to handle it in
> >> ACPI code. But again, most uses of offlining+removing boot memory are
> >> abusing the memory hotunplug infrastructure and should rather be solved
> >> cleanly via a different mechanism (firmware, kernel cmdline, ...).
> >>
> >> Just recently discussed in
> >>
> >>
> https://lkml.kernel.org/r/de8388df2fbc5a6a33aab95831ba7db4@xxxxxxxxxxxxxx
> >>
> >>>> Further, the device driver associated with the type 2 device/accelerator may
> >>>> want to save off a chunk of HDM for driver private use.
> >>>> So it seems the more appropriate model may be something like dev dax
> model
> >>>> where the device driver probe/open calls add_memory_driver_managed,
> and
> >>>> the driver could choose how much of the HDM it wants to reserve and how
> >>>> much to make generally available for application mmap/malloc.
> >>>
> >>> Sure, it can always be driver managed. The trick will be getting the
> >>> platform firmware to agree to not map it by default, but I suspect
> >>> you'll have a hard time convincing platform-firmware to take that
> >>> stance. The BIOS does not know, and should not care what OS is booting
> >>> when it produces the memory map. So I think CXL memory unplug after
> >>> the fact is more realistic than trying to get the BIOS not to map it.
> >>> So, to me it looks like arm64 needs to reconsider its unplug stance.
> >>
> >> My personal opinion is, if memory isn't just "ordinary system RAM", then
> >> let the system know early that memory is special (as we do with
> >> soft-reserved).
> >>
> >> Ideally, you could configure the firmware (e.g., via BIOS setup) on what
> >> to do, that's the cleanest solution, but I can understand that's rather
> >> hard to achieve.
> >
> > Yes, my hope, which is about the most influence I can have on
> > platform-firmware implementations, is that it marks CXL attached
> > memory as soft-reserved by default and allow OS policy decide where it
> > goes. Barring that, for the configuration that Vikram mentioned, the
> > only other way to get this differentiated / not-ordinary system-ram
> > back to being driver managed would be to unplug it. The soft-reserved
> > path is cleaner.
> 
> If we already need kernel cmdline parameters (movablecore), we can
> handle this differently via the cmdline. That sets expectations for
> people implementing the firmware - we shouldn't make their life too easy
> with such decisions.
> 
> The paragraph started with
> 
> "One could envision that BIOS/UEFI could expose the HDM in EFI memory
> map ..." Let's not envision it, but instead suggest people to not do it ;)
> 

Sounds good to me! Mahesh, let’s line this topic up for discussion in a CXL
UEFI/ACPI subteam meeting, and find a way to add ECR implementation note
to the spec that UEFI/BIOS NOT expose HDM in EFI memory map.

Thanks,
Vikram




[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux