On Tuesday, 3 November 2020 6:25:23 AM AEDT Vikram Sethi wrote: > > > > be sufficient, but depending if driver had done the add_memory in probe, > > > > it perhaps would be onerous to have to remove_memory as well before reset, > > > > and then add it back after reset. I realize you’re saying such a procedure > > > > would be abusing hotplug framework, and we could perhaps require that > > memory > > > > be removed prior to reset, but not clear to me that it *must* be removed for > > > > correctness. I'm not sure exactly what you meant by "unavailable", but on some platforms (eg. PowerPC) it must be removed for correctness if hardware access to the memory is going away for any period of time. remove_memory() is what makes it safe to physically remove the memory as it triggers things like cache flushing. Without this PPC would see memory failure machine checks if it ever tried to writeback any dirty cache lines to the now inaccessible memory. > > > > Another usecase of offlining without removing HDM could be around > > > > Virtualization/passing entire device with its memory to a VM. If device was > > > > being used in the host kernel, and is then unbound, and bound to vfio- pci > > > > (vfio-cxl?), would we expect vfio-pci to add_memory_driver_managed? > > > > > > At least for passing through memory to VMs (via KVM), you don't actually > > > need struct pages / memory exposed to the buddy via > > > add_memory_driver_managed(). Actually, doing that sounds like the wrong > > > approach. > > > > > > E.g., you would "allocate" the memory via devdax/dax_hmat and directly > > > map the resulting device into guest address space. At least that's what > > > some people are doing with > > How does memory_failure forwarding to guest work in that case? > IIUC it doesn't without a struct page in the host. > For normal memory, when VM consumes poison, host kernel signals > Userspace with SIGBUS and si-code that says Action Required, which > QEMU injects to the guest. > IBM had done something like you suggest with coherent GPU memory and IIUC > memory_failure forwarding to guest VM does not work there. > > kernel https://lkml.org/lkml/2018/12/20/103 > QEMU: https://patchwork.kernel.org/patch/10831455/ The above patches simply allow the coherent GPU physical memory ranges to get mapped into a guest VM in a similar way to an MMIO range (ie. without a struct page in the host). So you are correct in that they do not deal with forwarding failures to a guest VM. Any GPU memory failure on PPC would currently get sent to the host in the same way as a normal system memory failure (ie. machine check). So in theory notification to a guest would work the same as a normal system memory failure. I say in theory because when I last looked at this some time back a guest kernel on PPC is not notified of memory errors. - Alistair > I would think we *do want* memory errors to be sent to a VM. > > > > > ...and Joao is working to see if the host kernel can skip allocating > > 'struct page' or do it on demand if the guest ever requests host > > kernel services on its memory. Typically it does not so host 'struct > > page' space for devdax memory ranges goes wasted. > Is memory_failure forwarded to and handled by guest? >