On Fri, 2016-03-04 at 18:23 -0800, Dan Williams wrote: > On Fri, Mar 4, 2016 at 6:48 PM, Toshi Kani <toshi.kani@xxxxxxx> wrote: > > On Thu, 2016-03-03 at 13:53 -0800, Dan Williams wrote: > > > On a platform where 'Persistent Memory' and 'System RAM' are mixed > > > within a given sparsemem section, trim the namespace and notify about > > > the > > > sub-optimal alignment. > > > > > > Cc: Toshi Kani <toshi.kani@xxxxxxx> > > > Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> > > > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > > > --- > > > drivers/nvdimm/namespace_devs.c | 7 ++ > > > drivers/nvdimm/pfn.h | 10 ++- > > > drivers/nvdimm/pfn_devs.c | 5 ++ > > > drivers/nvdimm/pmem.c | 125 ++++++++++++++++++++++++++++- > > > ---- > > > ------ > > > 4 files changed, 111 insertions(+), 36 deletions(-) > > > > > > diff --git a/drivers/nvdimm/namespace_devs.c > > > b/drivers/nvdimm/namespace_devs.c > > > index 8ebfcaae3f5a..463756ca2d4b 100644 > > > --- a/drivers/nvdimm/namespace_devs.c > > > +++ b/drivers/nvdimm/namespace_devs.c > > > @@ -133,6 +133,7 @@ bool nd_is_uuid_unique(struct device *dev, u8 > > > *uuid) > > > bool pmem_should_map_pages(struct device *dev) > > > { > > > struct nd_region *nd_region = to_nd_region(dev->parent); > > > + struct nd_namespace_io *nsio; > > > > > > if (!IS_ENABLED(CONFIG_ZONE_DEVICE)) > > > return false; > > > @@ -143,6 +144,12 @@ bool pmem_should_map_pages(struct device *dev) > > > if (is_nd_pfn(dev) || is_nd_btt(dev)) > > > return false; > > > > > > + nsio = to_nd_namespace_io(dev); > > > + if (region_intersects(nsio->res.start, resource_size(&nsio- > > > > res), > > > + IORESOURCE_SYSTEM_RAM, > > > + IORES_DESC_NONE) == REGION_MIXED) > > > > Should this be != REGION_DISJOINT for safe? > > Acutally, it's ok. It doesn't need to be disjoint. The problem is > mixing an mm-zone within a given section. If the region intersects > system-ram then devm_memremap_pages() is a no-op and we can use the > existing page allocation and linear mapping. Oh, I see. > > > > > + return false; > > > + > > > > : > > > > > @@ -304,21 +311,56 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > > > } > > > > > > memset(pfn_sb, 0, sizeof(*pfn_sb)); > > > - npfns = (pmem->size - SZ_8K) / SZ_4K; > > > + > > > + /* > > > + * Check if pmem collides with 'System RAM' when section > > > aligned > > > and > > > + * trim it accordingly > > > + */ > > > + nsio = to_nd_namespace_io(&ndns->dev); > > > + start = PHYS_SECTION_ALIGN_DOWN(nsio->res.start); > > > + size = resource_size(&nsio->res); > > > + if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM, > > > + IORES_DESC_NONE) == REGION_MIXED) { > > > + > > > + start = nsio->res.start; > > > + start_pad = PHYS_SECTION_ALIGN_UP(start) - start; > > > + } > > > + > > > + start = nsio->res.start; > > > + size = PHYS_SECTION_ALIGN_UP(start + size) - start; > > > + if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM, > > > + IORES_DESC_NONE) == REGION_MIXED) { > > > + size = resource_size(&nsio->res); > > > + end_trunc = start + size - > > > PHYS_SECTION_ALIGN_DOWN(start > > > + size); > > > + } > > > > This check seems to assume that guest's regular memory layout does not > > change. That is, if there is no collision at first, there won't be any > > later. Is this a valid assumption? > > If platform firmware changes the physical alignment during the > lifetime of the namespace there's not much we can do. The physical alignment can be changed as long as it is large enough (see below). > Another problem > not addressed by this patch is firmware choosing to hot plug system > ram into the same section as persistent memory. Yes, and it does not have to be a hot-plug operation. Memory size may be changed off-line. Data image can be copied to different guests for instant deployment, or may be migrated to a different guest. > As far as I can see > all we do is ask firmware implementations to respect Linux section > boundaries and otherwise not change alignments. In addition to the requirement that pmem range alignment may not change, the code also requires a regular memory range does not change to intersect with a pmem section later. This seems fragile to me since guest config may vary / change as I mentioned above. So, shouldn't the driver fails to attach when the range is not aligned by the section size? Since we need to place a requirement to firmware anyway, we can simply state that it must be aligned by 128MiB (at least) on x86. Then, memory and pmem physical layouts can be changed as long as this requirement is met. Thanks, -Toshi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>