On Mon, Nov 2, 2020 at 9:53 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 02.11.20 17:17, Vikram Sethi wrote: > > Hi David, > >> From: David Hildenbrand <david@xxxxxxxxxx> > >> On 31.10.20 17:51, Dan Williams wrote: > >>> On Sat, Oct 31, 2020 at 3:21 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > >>>> > >>>> On 30.10.20 21:37, Dan Williams wrote: > >>>>> On Wed, Oct 28, 2020 at 4:06 PM Vikram Sethi <vsethi@xxxxxxxxxx> wrote: > >>>>>> > >>>>>> Hello, > >>>>>> > >>>>>> I wanted to kick off a discussion on how Linux onlining of CXL [1] type 2 > >> device > >>>>>> Coherent memory aka Host managed device memory (HDM) will work for > >> type 2 CXL > >>>>>> devices which are available/plugged in at boot. A type 2 CXL device can be > >> simply > >>>>>> thought of as an accelerator with coherent device memory, that also has a > >>>>>> CXL.cache to cache system memory. > >>>>>> > >>>>>> One could envision that BIOS/UEFI could expose the HDM in EFI memory map > >>>>>> as conventional memory as well as in ACPI SRAT/SLIT/HMAT. However, at > >> least > >>>>>> on some architectures (arm64) EFI conventional memory available at kernel > >> boot > >>>>>> memory cannot be offlined, so this may not be suitable on all architectures. > >>>>> > >>>>> That seems an odd restriction. Add David, linux-mm, and linux-acpi as > >>>>> they might be interested / have comments on this restriction as well. > >>>>> > >>>> > >>>> I am missing some important details. > >>>> > >>>> a) What happens after offlining? Will the memory be remove_memory()'ed? > >>>> Will the device get physically unplugged? > >>>> > > Not always IMO. If the device was getting reset, the HDM memory is going to be > > unavailable while device is reset. Offlining the memory around the reset would > > Ouch, that speaks IMHO completely against exposing it as System RAM as > default. > > > be sufficient, but depending if driver had done the add_memory in probe, > > it perhaps would be onerous to have to remove_memory as well before reset, > > and then add it back after reset. I realize you’re saying such a procedure > > would be abusing hotplug framework, and we could perhaps require that memory > > be removed prior to reset, but not clear to me that it *must* be removed for > > correctness. > > > > Another usecase of offlining without removing HDM could be around > > Virtualization/passing entire device with its memory to a VM. If device was > > being used in the host kernel, and is then unbound, and bound to vfio-pci > > (vfio-cxl?), would we expect vfio-pci to add_memory_driver_managed? > > At least for passing through memory to VMs (via KVM), you don't actually > need struct pages / memory exposed to the buddy via > add_memory_driver_managed(). Actually, doing that sounds like the wrong > approach. > > E.g., you would "allocate" the memory via devdax/dax_hmat and directly > map the resulting device into guest address space. At least that's what > some people are doing with ...and Joao is working to see if the host kernel can skip allocating 'struct page' or do it on demand if the guest ever requests host kernel services on its memory. Typically it does not so host 'struct page' space for devdax memory ranges goes wasted.