Re: [v1 2/2] device-dax: "Hotremove" persistent memory that is used like normal RAM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Apr 20, 2019 at 8:36 AM Pavel Tatashin
<pasha.tatashin@xxxxxxxxxx> wrote:
>
> It is now allowed to use persistent memory like a regular RAM, but
> currently there is no way to remove this memory until machine is
> rebooted.
>
> This work expands the functionality to also allow hot removing
> previously hotplugged persistent memory, and recover the device for use
> for other purposes.
>
> To hotremove persistent memory, the management software must unbind it
> from device-dax/kmem driver:
>
>             echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
>
> Signed-off-by: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx>
> ---
>  drivers/dax/dax-private.h |  2 +
>  drivers/dax/kmem.c        | 77 +++++++++++++++++++++++++++++++++++++--
>  2 files changed, 75 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
> index a45612148ca0..999aaf3a29b3 100644
> --- a/drivers/dax/dax-private.h
> +++ b/drivers/dax/dax-private.h
> @@ -53,6 +53,7 @@ struct dax_region {
>   * @pgmap - pgmap for memmap setup / lifetime (driver owned)
>   * @ref: pgmap reference count (driver owned)
>   * @cmp: @ref final put completion (driver owned)
> + * @dax_mem_res: physical address range of hotadded DAX memory
>   */
>  struct dev_dax {
>         struct dax_region *region;
> @@ -62,6 +63,7 @@ struct dev_dax {
>         struct dev_pagemap pgmap;
>         struct percpu_ref ref;
>         struct completion cmp;
> +       struct resource *dax_kmem_res;
>  };
>
>  static inline struct dev_dax *to_dev_dax(struct device *dev)
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 4c0131857133..026c34f93df5 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -71,21 +71,90 @@ int dev_dax_kmem_probe(struct device *dev)
>                 kfree(new_res);
>                 return rc;
>         }
> +       dev_dax->dax_kmem_res = new_res;
>
>         return 0;
>  }
>
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +/*
> + * Offline device-dax's memory_blocks. If a memory_block cannot be offlined
> + * a warning is printed and an error is returned. dax hotremove can succeed
> + * only when every memory_block is offline.
> + */
> +static int
> +offline_memblock_cb(struct memory_block *mem, void *arg)
> +{
> +       struct device *dev = (struct device *)arg;
> +       int rc = device_offline(&mem->dev);
> +
> +       if (rc < 0) {
> +               unsigned long spfn = section_nr_to_pfn(mem->start_section_nr);
> +               unsigned long epfn = section_nr_to_pfn(mem->end_section_nr);
> +               phys_addr_t spa = spfn << PAGE_SHIFT;
> +               phys_addr_t epa = epfn << PAGE_SHIFT;
> +
> +               dev_warn(dev, "could not offline memory block [%pa-%pa]\n",
> +                        &spa, &epa);
> +
> +               return rc;
> +       }
> +
> +       return 0;
> +}
> +
> +static int dev_dax_kmem_remove(struct device *dev)
> +{
> +       struct dev_dax *dev_dax = to_dev_dax(dev);
> +       struct resource *res = dev_dax->dax_kmem_res;
> +       resource_size_t kmem_start;
> +       resource_size_t kmem_size;
> +       unsigned long start_pfn;
> +       unsigned long end_pfn;
> +       int rc;
> +
> +       /*
> +        * dax kmem resource does not exist, means memory was never hotplugged.
> +        * So, nothing to do here.
> +        */
> +       if (!res)
> +               return 0;
> +
> +       kmem_start = res->start;
> +       kmem_size = resource_size(res);
> +       start_pfn = kmem_start >> PAGE_SHIFT;
> +       end_pfn = start_pfn + (kmem_size >> PAGE_SHIFT) - 1;
> +
> +       /* Walk and offline every singe memory_block of the dax region. */
> +       lock_device_hotplug();
> +       rc = walk_memory_range(start_pfn, end_pfn, dev, offline_memblock_cb);
> +       unlock_device_hotplug();
> +       if (rc)
> +               return rc;

This potential early return is the reason why memory hotremove is not
reliable vs the driver-core. If this walk fails to offline the memory
it will still be online, but the driver-core has no consideration for
device-unbind failing. The ubind will proceed while the memory stays
pinned.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux