Hi Linus, please pull from: git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/libnvdimm-for-4.3 ...to receive the libnvdimm update and related changes for 4.3. This update has successfully completed a 0day-kbuild run and has appeared in a linux-next release. The changes outside of the typical drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and the introduction of ZONE_DEVICE + devm_memremap_pages(). This has a minor conflict with a fix that went into v4.2, commit de4a196c02a2 "nfit, nd_blk: BLK status register is only 32 bits", but otherwise merges cleanly with mainline. -- The following changes since commit cbfe8fa6cd672011c755c3cd85c9ffd4e2d10a6f: Linux 4.2-rc4 (2015-07-26 12:26:21 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm tags/libnvdimm-for-4.3 for you to fetch changes up to 004f1afbe199e6ab20805b95aefd83ccd24bc5c7: libnvdimm, pmem: direct map legacy pmem by default (2015-08-28 23:40:05 -0400) ---------------------------------------------------------------- libnvdimm for 4.3: 1/ Introduce ZONE_DEVICE and devm_memremap_pages() as a generic mechanism for adding device-driver-discovered memory regions to the kernel's direct map. This facility is used by the pmem driver to enable pfn_to_page() operations on the page frames returned by DAX ('direct_access' in 'struct block_device_operations'). For now, the 'memmap' allocation for these "device" pages comes from "System RAM". Support for allocating the memmap from device memory will arrive in a later kernel. 2/ Introduce memremap() to replace usages of ioremap_cache() and ioremap_wt(). memremap() drops the __iomem annotation for these mappings to memory that do not have i/o side effects. The replacement of ioremap_cache() with memremap() is limited to the pmem driver to ease merging the api change in v4.3. Completion of the conversion is targeted for v4.4. 3/ Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem driver, update the VFS DAX implementation and PMEM api to provide persistence guarantees for kernel operations on a DAX mapping. 4/ Convert the ACPI NFIT 'BLK' driver to map the block apertures as cacheable to improve performance. 5/ Miscellaneous updates and fixes to libnvdimm including support for issuing "address range scrub" commands, clarifying the optimal 'sector size' of pmem devices, a clarification of the usage of the ACPI '_STA' (status) property for DIMM devices, and other minor fixes. ---------------------------------------------------------------- Christoph Hellwig (4): devres: add devm_memremap pmem: switch to devm_ allocations mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h add devm_memremap_pages Dan Williams (15): libnvdimm, btt: sparse fix mm: enhance region_is_ram() to region_intersects() arch, drivers: don't include <asm/io.h> directly, use <linux/io.h> instead cleanup IORESOURCE_CACHEABLE vs ioremap() arch: introduce memremap() visorbus: switch from ioremap_cache to memremap pmem: convert to generic memremap libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option Merge branch 'pmem-api' into libnvdimm-for-next dax: drop size parameter to ->direct_access() mm: ZONE_DEVICE for "device memory" x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB libnvdimm, pfn: 'struct page' provider infrastructure libnvdimm, pmem: 'struct page' for pmem libnvdimm, pmem: direct map legacy pmem by default Linda Knippers (1): nfit: Don't check _STA on NVDIMM devices Randy Dunlap (1): nvdimm: fix inline function return type warning Ross Zwisler (7): pmem, x86: move x86 PMEM API to new pmem.h header pmem: remove layer when calling arch_has_wmb_pmem() pmem, x86: clean up conditional pmem includes pmem: add copy_from_iter_pmem() and clear_pmem() dax: update I/O path to do proper PMEM flushing pmem, dax: have direct_access use __pmem annotation nd_blk: change aperture mapping from WC to WB Vishal Verma (6): libnvdimm: Update name of the ars_status_record mask field libnvdimm: Add DSM support for Address Range Scrub commands libnvdimm, pmem: Change pmem physical sector size to PAGE_SIZE libnvdimm, btt: clean up internal interfaces libnvdimm, btt: consolidate arena validation libnvdimm, btt: write and validate parent_uuid yalin wang (1): nvdimm: change to use generic kvfree() Documentation/filesystems/Locking | 3 +- MAINTAINERS | 1 + arch/arm/include/asm/memory.h | 6 - arch/arm/mach-clps711x/board-cdb89712.c | 2 +- arch/arm/mach-shmobile/pm-rcar.c | 2 +- arch/arm64/include/asm/memory.h | 6 - arch/ia64/include/asm/io.h | 1 + arch/ia64/kernel/cyclone.c | 2 +- arch/ia64/mm/init.c | 4 +- arch/powerpc/kernel/pci_of_scan.c | 2 +- arch/powerpc/mm/mem.c | 4 +- arch/powerpc/sysdev/axonram.c | 7 +- arch/s390/mm/init.c | 2 +- arch/sh/include/asm/io.h | 1 + arch/sh/mm/init.c | 5 +- arch/sparc/kernel/pci.c | 3 +- arch/tile/mm/init.c | 2 +- arch/unicore32/include/asm/memory.h | 6 - arch/x86/Kconfig | 9 +- arch/x86/include/asm/cacheflush.h | 73 +----- arch/x86/include/asm/io.h | 6 - arch/x86/include/asm/pmem.h | 153 +++++++++++ arch/x86/include/uapi/asm/e820.h | 2 +- arch/x86/kernel/Makefile | 2 +- arch/x86/kernel/pmem.c | 79 +----- arch/x86/mm/init_32.c | 4 +- arch/x86/mm/init_64.c | 4 +- arch/xtensa/include/asm/io.h | 1 + drivers/acpi/Kconfig | 1 + drivers/acpi/nfit.c | 79 +++--- drivers/acpi/nfit.h | 17 +- drivers/block/brd.c | 8 +- drivers/isdn/icn/icn.h | 2 +- drivers/mtd/devices/slram.c | 2 +- drivers/mtd/nand/diskonchip.c | 2 +- drivers/mtd/onenand/generic.c | 2 +- drivers/nvdimm/Kconfig | 23 ++ drivers/nvdimm/Makefile | 5 + drivers/nvdimm/btt.c | 50 +--- drivers/nvdimm/btt.h | 3 + drivers/nvdimm/btt_devs.c | 215 ++++------------ drivers/nvdimm/claim.c | 201 +++++++++++++++ drivers/nvdimm/dimm_devs.c | 5 +- drivers/nvdimm/e820.c | 87 +++++++ drivers/nvdimm/namespace_devs.c | 89 ++++++- drivers/nvdimm/nd-core.h | 9 + drivers/nvdimm/nd.h | 67 ++++- drivers/nvdimm/pfn.h | 35 +++ drivers/nvdimm/pfn_devs.c | 337 +++++++++++++++++++++++++ drivers/nvdimm/pmem.c | 245 +++++++++++++++--- drivers/nvdimm/region.c | 2 + drivers/nvdimm/region_devs.c | 20 ++ drivers/pci/probe.c | 3 +- drivers/pnp/manager.c | 2 - drivers/s390/block/dcssblk.c | 10 +- drivers/scsi/aic94xx/aic94xx_init.c | 7 +- drivers/scsi/arcmsr/arcmsr_hba.c | 5 +- drivers/scsi/mvsas/mv_init.c | 15 +- drivers/scsi/sun3x_esp.c | 2 +- drivers/staging/comedi/drivers/ii_pci20kc.c | 1 + drivers/staging/unisys/visorbus/visorchannel.c | 16 +- drivers/staging/unisys/visorbus/visorchipset.c | 17 +- drivers/tty/serial/8250/8250_core.c | 2 +- drivers/video/fbdev/ocfb.c | 1 - drivers/video/fbdev/s1d13xxxfb.c | 3 +- drivers/video/fbdev/stifb.c | 1 + fs/block_dev.c | 4 +- fs/dax.c | 62 +++-- include/asm-generic/memory_model.h | 6 + include/linux/blkdev.h | 8 +- include/linux/io-mapping.h | 2 +- include/linux/io.h | 33 +++ include/linux/libnvdimm.h | 4 + include/linux/memory_hotplug.h | 5 +- include/linux/mm.h | 9 +- include/linux/mmzone.h | 23 ++ include/linux/mtd/map.h | 2 +- include/linux/pmem.h | 115 ++++++--- include/uapi/linux/ndctl.h | 12 +- include/video/vga.h | 2 +- kernel/Makefile | 2 + kernel/memremap.c | 190 ++++++++++++++ kernel/resource.c | 61 +++-- lib/Kconfig | 3 + lib/devres.c | 13 +- lib/pci_iomap.c | 7 +- mm/Kconfig | 17 ++ mm/memory_hotplug.c | 14 +- mm/page_alloc.c | 3 + tools/testing/nvdimm/Kbuild | 13 +- tools/testing/nvdimm/test/iomap.c | 85 ++++++- tools/testing/nvdimm/test/nfit.c | 209 ++++++++++----- 92 files changed, 2142 insertions(+), 745 deletions(-) create mode 100644 arch/x86/include/asm/pmem.h create mode 100644 drivers/nvdimm/claim.c create mode 100644 drivers/nvdimm/e820.c create mode 100644 drivers/nvdimm/pfn.h create mode 100644 drivers/nvdimm/pfn_devs.c create mode 100644 kernel/memremap.c commit 5e32940621eb62064d98f42c9889db71b0368bde Author: Dan Williams <dan.j.williams@xxxxxxxxx> Date: Sat Jul 11 10:02:46 2015 -0400 libnvdimm, btt: sparse fix Fix: drivers/nvdimm/btt.c:635:29: warning: restricted __le64 degrades to integer Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit ec92777f2ba93c00387b8fe53780c25adc57c744 Author: Vishal Verma <vishal.l.verma@xxxxxxxxx> Date: Thu Jul 9 13:25:35 2015 -0600 libnvdimm: Update name of the ars_status_record mask field The spec suggests that this is a simple 'length' field, not a mask. Update the name accordingly. Signed-off-by: Vishal Verma <vishal.l.verma@xxxxxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 39c686b862cdb2049b90e095b6c6c727b2a7ab60 Author: Vishal Verma <vishal.l.verma@xxxxxxxxx> Date: Thu Jul 9 13:25:36 2015 -0600 libnvdimm: Add DSM support for Address Range Scrub commands Add support for the three ARS DSM commands: - Query ARS Capabilities - Queries the firmware to check if a given range supports scrub, and if so, which type (persistent vs. volatile) - Start ARS - Starts a scrub for a given range/type - Query ARS Status - Checks status of a previously started scrub, and provides the error logs if any. The commands are described by the example DSM spec at: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf Also add these commands to the nfit_test test framework, and return canned data. Signed-off-by: Vishal Verma <vishal.l.verma@xxxxxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 6b47496a6fc81816e7edaf8224dfb88e402a05f5 Author: Vishal Verma <vishal.l.verma@xxxxxxxxx> Date: Thu Jul 23 11:58:48 2015 -0600 libnvdimm, pmem: Change pmem physical sector size to PAGE_SIZE Based on a patch: c8fa317 brd: Request from fdisk 4k alignment by Boaz Harrosh, allow fdisk to create properly aligned partitions for DAX. This will also cause mkfs.ext4 to emit a warning if using a file system block size of less than PAGE_SIZE. Cc: Dan Williams <dan.j.williams@xxxxxxxxx> Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Cc: Matthew Wilcox <matthew.r.wilcox@xxxxxxxxx> Cc: Christoph Hellwig <hch@xxxxxx> Cc: Elliott, Robert <Elliott@xxxxxx> Signed-off-by: Vishal Verma <vishal.l.verma@xxxxxxxxx> Acked-by: Boaz Harrosh <boaz@xxxxxxxxxxxxx> Acked-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 60e95f43fc8573e81f54b0c1e0bc542c2260d956 Author: Linda Knippers <linda.knippers@xxxxxx> Date: Wed Jul 22 16:17:22 2015 -0400 nfit: Don't check _STA on NVDIMM devices The _STA only applies to the root device, not the individual NVDIMMS, so don't check here. NVDIMM device state flags are checked elsewhere. Signed-off-by: Linda Knippers <linda.knippers@xxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit f6ef5a2a50816b58e3126206de13d0b9fdf89df5 Author: Randy Dunlap <rdunlap@xxxxxxxxxxxxx> Date: Tue Jul 28 12:27:01 2015 -0700 nvdimm: fix inline function return type warning Fix multiple build warnings when CONFIG_BTT is not enabled: In file included from ../drivers/nvdimm/bus.c:29:0: ../drivers/nvdimm/nd.h:169:15: warning: return type defaults to 'int' [-Wreturn-type] static inline nd_btt_probe(struct nd_namespace_common *ndns, void *drvdata) ^ Signed-off-by: Randy Dunlap <rdunlap@xxxxxxxxxxxxx> Cc: Dan Williams <dan.j.williams@xxxxxxxxx> Cc: linux-nvdimm@xxxxxxxxxxxx Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 124fe20d94630b6f173dae5eb815e6e6e350c72d Author: Dan Williams <dan.j.williams@xxxxxxxxx> Date: Mon Aug 10 23:07:05 2015 -0400 mm: enhance region_is_ram() to region_intersects() region_is_ram() is used to prevent the establishment of aliased mappings to physical "System RAM" with incompatible cache settings. However, it uses "-1" to indicate both "unknown" memory ranges (ranges not described by platform firmware) and "mixed" ranges (where the parameters describe a range that partially overlaps "System RAM"). Fix this up by explicitly tracking the "unknown" vs "mixed" resource cases and returning REGION_INTERSECTS, REGION_MIXED, or REGION_DISJOINT. This re-write also adds support for detecting when the requested region completely eclipses all of a resource. Note, the implementation treats overlaps between "unknown" and the requested memory type as REGION_INTERSECTS. Finally, other memory types can be passed in by name, for now the only usage "System RAM". Suggested-by: Luis R. Rodriguez <mcgrof@xxxxxxxx> Reviewed-by: Toshi Kani <toshi.kani@xxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 2584cf83578c26db144730ef498f4070f82ee3ea Author: Dan Williams <dan.j.williams@xxxxxxxxx> Date: Mon Aug 10 23:07:05 2015 -0400 arch, drivers: don't include <asm/io.h> directly, use <linux/io.h> instead Preparation for uniform definition of ioremap, ioremap_wc, ioremap_wt, and ioremap_cache, tree-wide. Acked-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 92b19ff50e8f242392d78b2aacc5b5b672f1796b Author: Dan Williams <dan.j.williams@xxxxxxxxx> Date: Mon Aug 10 23:07:06 2015 -0400 cleanup IORESOURCE_CACHEABLE vs ioremap() Quoting Arnd: I was thinking the opposite approach and basically removing all uses of IORESOURCE_CACHEABLE from the kernel. There are only a handful of them.and we can probably replace them all with hardcoded ioremap_cached() calls in the cases they are actually useful. All existing usages of IORESOURCE_CACHEABLE call ioremap() instead of ioremap_nocache() if the resource is cacheable, however ioremap() is uncached by default. Clearly none of the existing usages care about the cacheability. Particularly devm_ioremap_resource() never worked as advertised since it always fell back to plain ioremap(). Clean this up as the new direction we want is to convert ioremap_<type>() usages to memremap(..., flags). Suggested-by: Arnd Bergmann <arnd@xxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 92281dee825f6d2eb07c441437e4196a44b0861c Author: Dan Williams <dan.j.williams@xxxxxxxxx> Date: Mon Aug 10 23:07:06 2015 -0400 arch: introduce memremap() Existing users of ioremap_cache() are mapping memory that is known in advance to not have i/o side effects. These users are forced to cast away the __iomem annotation, or otherwise neglect to fix the sparse errors thrown when dereferencing pointers to this memory. Provide memremap() as a non __iomem annotated ioremap_*() in the case when ioremap is otherwise a pointer to cacheable memory. Empirically, ioremap_<cacheable-type>() call sites are seeking memory-like semantics (e.g. speculative reads, and prefetching permitted). memremap() is a break from the ioremap implementation pattern of adding a new memremap_<type>() for each mapping type and having silent compatibility fall backs. Instead, the implementation defines flags that are passed to the central memremap() and if a mapping type is not supported by an arch memremap returns NULL. We introduce a memremap prototype as a trivial wrapper of ioremap_cache() and ioremap_wt(). Later, once all ioremap_cache() and ioremap_wt() usage has been removed from drivers we teach archs to implement arch_memremap() with the ability to strictly enforce the mapping type. Cc: Arnd Bergmann <arnd@xxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 3103dc0304fd9c8ab576977cd98140d4fbac1730 Author: Dan Williams <dan.j.williams@xxxxxxxxx> Date: Mon Aug 10 23:07:06 2015 -0400 visorbus: switch from ioremap_cache to memremap In preparation for deprecating ioremap_cache() convert its usage in visorbus to memremap. Cc: Benjamin Romer <benjamin.romer@xxxxxxxxxx> Cc: David Kershner <david.kershner@xxxxxxxxxx> Acked-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit e836a256e8fd579c9d7a3685f22981225a1ca451 Author: Dan Williams <dan.j.williams@xxxxxxxxx> Date: Wed Aug 12 18:42:56 2015 -0400 pmem: convert to generic memremap Kill arch_memremap_pmem() and just let the architecture specify the flags to be passed to memremap(). Default to writethrough by default. Suggested-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit fbde1414acc0440024083bf0c391b259bcfc4826 Author: Vishal Verma <vishal.l.verma@xxxxxxxxx> Date: Wed Jul 29 14:58:07 2015 -0600 libnvdimm, btt: clean up internal interfaces Consolidate the parameters passed to arena_is_valid into just nd_btt, and an info block to increase re-usability. Similarly, btt_arena_write_layout doesn't need to be passed a uuid, as it can be obtained from arena->nd_btt. Signed-off-by: Vishal Verma <vishal.l.verma@xxxxxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit ab45e7632717b811e0786e46ca5ad279cb731b66 Author: Vishal Verma <vishal.l.verma@xxxxxxxxx> Date: Wed Jul 29 14:58:08 2015 -0600 libnvdimm, btt: consolidate arena validation Use arena_is_valid as a common routine for checking the validity of an info block from both discover_arenas, and nd_btt_probe. As a result, don't check for validity of the BTT's UUID, and lbasize. The checksum in the BTT info block guarantees self-consistency, and when we're called from nd_btt_probe, we don't have a valid uuid or lbasize available to check against. Also cleanup to return a bool instead of an int. Signed-off-by: Vishal Verma <vishal.l.verma@xxxxxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 6ec689542b5bc516187917d49b112847dfb75b0b Author: Vishal Verma <vishal.l.verma@xxxxxxxxx> Date: Wed Jul 29 14:58:09 2015 -0600 libnvdimm, btt: write and validate parent_uuid When a BTT is instantiated on a namespace it must validate the namespace uuid matches the 'parent_uuid' stored in the btt superblock. This property enforces that changing the namespace UUID invalidates all former BTT instances on that storage. For "IO namespaces" that don't have a label or UUID, the parent_uuid is set to zero, and this validation is skipped. For such cases, old BTTs have to be invalidated by forcing the namespace to raw mode, and overwriting the BTT info blocks. Based on a patch by Dan Williams <dan.j.williams@xxxxxxxxx> Signed-off-by: Vishal Verma <vishal.l.verma@xxxxxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 7d3dcf26a6559fa82af3f53e2c8b163cec95fdaf Author: Christoph Hellwig <hch@xxxxxx> Date: Mon Aug 10 23:07:07 2015 -0400 devres: add devm_memremap Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 708ab62bef1ed3a3cf065a4138bd87f5d083cfeb Author: Christoph Hellwig <hch@xxxxxx> Date: Mon Aug 10 23:07:08 2015 -0400 pmem: switch to devm_ allocations Signed-off-by: Christoph Hellwig <hch@xxxxxx> [djbw: tools/testing/nvdimm/ and memunmap_pmem support] Reviewed-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 7a67832c7e44c20935c5d6f2264035a0f7bf0d8f Author: Dan Williams <dan.j.williams@xxxxxxxxx> Date: Wed Aug 19 00:34:34 2015 -0400 libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option We currently register a platform device for e820 type-12 memory and register a nvdimm bus beneath it. Registering the platform device triggers the device-core machinery to probe for a driver, but that search currently comes up empty. Building the nvdimm-bus registration into the e820_pmem platform device registration in this way forces libnvdimm to be built-in. Instead, convert the built-in portion of CONFIG_X86_PMEM_LEGACY to simply register a platform device and move the rest of the logic to the driver for e820_pmem, for the following reasons: 1/ Letting e820_pmem support be a module allows building and testing libnvdimm.ko changes without rebooting 2/ All the normal policy around modules can be applied to e820_pmem (unbind to disable and/or blacklisting the module from loading by default) 3/ Moving the driver to a generic location and converting it to scan "iomem_resource" rather than "e820.map" means any other architecture can take advantage of this simple nvdimm resource discovery mechanism by registering a resource named "Persistent Memory (legacy)" Cc: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 40603526569b304dd92f720f2f8ab11e828ea145 Author: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Date: Tue Aug 18 13:55:36 2015 -0600 pmem, x86: move x86 PMEM API to new pmem.h header Move the x86 PMEM API implementation out of asm/cacheflush.h and into its own header asm/pmem.h. This will allow members of the PMEM API to be more easily identified on this and other architectures. Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Suggested-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 18279b467a9d89afe44afbc19d768e834dbf4545 Author: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Date: Tue Aug 18 13:55:37 2015 -0600 pmem: remove layer when calling arch_has_wmb_pmem() Prior to this change arch_has_wmb_pmem() was only called by arch_has_pmem_api(). Both arch_has_wmb_pmem() and arch_has_pmem_api() checked to make sure that CONFIG_ARCH_HAS_PMEM_API was enabled. Instead, remove the old arch_has_wmb_pmem() wrapper to be rid of one extra layer of indirection and the redundant CONFIG_ARCH_HAS_PMEM_API check. Rename __arch_has_wmb_pmem() to arch_has_wmb_pmem() since we no longer have a wrapper, and just have arch_has_pmem_api() call the architecture specific arch_has_wmb_pmem() directly. Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 4a370df5534ef727cba9a9d74bf22e0609f91d6e Author: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Date: Tue Aug 18 13:55:38 2015 -0600 pmem, x86: clean up conditional pmem includes Prior to this change x86_64 used the pmem defines in arch/x86/include/asm/pmem.h, and UM used the default ones at the top of include/linux/pmem.h. The inclusion or exclusion in linux/pmem.h was controlled by CONFIG_ARCH_HAS_PMEM_API, but the ones in asm/pmem.h were controlled by ARCH_HAS_NOCACHE_UACCESS. Instead, control them both with CONFIG_ARCH_HAS_PMEM_API so that it's clear that they are related and we don't run into the possibility where they are both included or excluded. Also remove a bunch of stale function prototypes meant for UM in asm/pmem.h - these just conflicted with the inline defaults in linux/pmem.h and gave compile errors. Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 5de490daec8b6354b90d5c9d3e2415b195f5adb6 Author: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Date: Tue Aug 18 13:55:39 2015 -0600 pmem: add copy_from_iter_pmem() and clear_pmem() Add support for two new PMEM APIs, copy_from_iter_pmem() and clear_pmem(). copy_from_iter_pmem() is used to copy data from an iterator into a PMEM buffer. clear_pmem() zeros a PMEM memory range. Both of these new APIs must be explicitly ordered using a wmb_pmem() function call and are implemented in such a way that the wmb_pmem() will make the stores to PMEM durable. Because both APIs are unordered they can be called as needed without introducing any unwanted memory barriers. Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 2765cfbb342c727c3fd47b165196cb16da158022 Author: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Date: Tue Aug 18 13:55:40 2015 -0600 dax: update I/O path to do proper PMEM flushing Update the DAX I/O path so that all operations that store data (I/O writes, zeroing blocks, punching holes, etc.) properly synchronize the stores to media using the PMEM API. This ensures that the data DAX is writing is durable on media before the operation completes. Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit e2e05394e4a3420dab96f728df4531893494e15d Author: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Date: Tue Aug 18 13:55:41 2015 -0600 pmem, dax: have direct_access use __pmem annotation Update the annotation for the kaddr pointer returned by direct_access() so that it is a __pmem pointer. This is consistent with the PMEM driver and with how this direct_access() pointer is used in the DAX code. Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit a06a7576526e10a99ea7721533e7f2df3e26baad Author: yalin wang <yalin.wang2010@xxxxxxxxx> Date: Thu Aug 27 19:35:48 2015 -0400 nvdimm: change to use generic kvfree() Signed-off-by: yalin wang <yalin.wang2010@xxxxxxxxx> Reviewed-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 67a3e8fe90156d41cd480d3dfbb40f3bc007c262 Author: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Date: Thu Aug 27 13:14:20 2015 -0600 nd_blk: change aperture mapping from WC to WB This should result in a pretty sizeable performance gain for reads. For rough comparison I did some simple read testing using PMEM to compare reads of write combining (WC) mappings vs write-back (WB). This was done on a random lab machine. PMEM reads from a write combining mapping: # dd of=/dev/null if=/dev/pmem0 bs=4096 count=100000 100000+0 records in 100000+0 records out 409600000 bytes (410 MB) copied, 9.2855 s, 44.1 MB/s PMEM reads from a write-back mapping: # dd of=/dev/null if=/dev/pmem0 bs=4096 count=1000000 1000000+0 records in 1000000+0 records out 4096000000 bytes (4.1 GB) copied, 3.44034 s, 1.2 GB/s To be able to safely support a write-back aperture I needed to add support for the "read flush" _DSM flag, as outlined in the DSM spec: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf This flag tells the ND BLK driver that it needs to flush the cache lines associated with the aperture after the aperture is moved but before any new data is read. This ensures that any stale cache lines from the previous contents of the aperture will be discarded from the processor cache, and the new data will be read properly from the DIMM. We know that the cache lines are clean and will be discarded without any writeback because either a) the previous aperture operation was a read, and we never modified the contents of the aperture, or b) the previous aperture operation was a write and we must have written back the dirtied contents of the aperture to the DIMM before the I/O was completed. In order to add support for the "read flush" flag I needed to add a generic routine to invalidate cache lines, mmio_flush_range(). This is protected by the ARCH_HAS_MMIO_FLUSH Kconfig variable, and is currently only supported on x86. Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 4a9bf88a5caa8495b5eb2b738d5fb40924bbc538 Merge: a06a7576526e 67a3e8fe9015 Author: Dan Williams <dan.j.williams@xxxxxxxxx> Date: Thu Aug 27 19:40:26 2015 -0400 Merge branch 'pmem-api' into libnvdimm-for-next commit cb389b9c0e00c30c9daf20287f7d91e2466edbb1 Author: Dan Williams <dan.j.williams@xxxxxxxxx> Date: Fri Aug 7 17:41:00 2015 -0400 dax: drop size parameter to ->direct_access() None of the implementations currently use it. The common bdev_direct_access() entry point handles all the size checks before calling ->direct_access(). Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 012dcef3f058385268630c0003e9b7f8dcafbeb4 Author: Christoph Hellwig <hch@xxxxxx> Date: Fri Aug 7 17:41:01 2015 -0400 mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h Three architectures already define these, and we'll need them genericly soon. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 033fbae988fcb67e5077203512181890848b8e90 Author: Dan Williams <dan.j.williams@xxxxxxxxx> Date: Sun Aug 9 15:29:06 2015 -0400 mm: ZONE_DEVICE for "device memory" While pmem is usable as a block device or via DAX mappings to userspace there are several usage scenarios that can not target pmem due to its lack of struct page coverage. In preparation for "hot plugging" pmem into the vmemmap add ZONE_DEVICE as a new zone to tag these pages separately from the ones that are subject to standard page allocations. Importantly "device memory" can be removed at will by userspace unbinding the driver of the device. Having a separate zone prevents allocation and otherwise marks these pages that are distinct from typical uniform memory. Device memory has different lifetime and performance characteristics than RAM. However, since we have run out of ZONES_SHIFT bits this functionality currently depends on sacrificing ZONE_DMA. Cc: H. Peter Anvin <hpa@xxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Cc: Rik van Riel <riel@xxxxxxxxxx> Cc: Mel Gorman <mgorman@xxxxxxx> Cc: Jerome Glisse <j.glisse@xxxxxxxxx> [hch: various simplifications in the arch interface] Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 41e94a851304f7acac840adec4004f8aeee53ad4 Author: Christoph Hellwig <hch@xxxxxx> Date: Mon Aug 17 16:00:35 2015 +0200 add devm_memremap_pages This behaves like devm_memremap except that it ensures we have page structures available that can back the region. Signed-off-by: Christoph Hellwig <hch@xxxxxx> [djbw: catch attempts to remap RAM, drop flags] Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 96601adb745186ccbcf5b078d4756f13381ec2af Author: Dan Williams <dan.j.williams@xxxxxxxxx> Date: Mon Aug 24 18:29:38 2015 -0400 x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB Given that a write-back (WB) mapping plus non-temporal stores is expected to be the most efficient way to access PMEM, update the definition of ARCH_HAS_PMEM_API to imply arch support for WB-mapped-PMEM. This is needed as a pre-requisite for adding PMEM to the direct map and mapping it with struct page. The above clarification for X86_64 means that memcpy_to_pmem() is permitted to use the non-temporal arch_memcpy_to_pmem() rather than needlessly fall back to default_memcpy_to_pmem() when the pcommit instruction is not available. When arch_memcpy_to_pmem() is not guaranteed to flush writes out of cache, i.e. on older X86_32 implementations where non-temporal stores may just dirty cache, ARCH_HAS_PMEM_API is simply disabled. The default fall back for persistent memory handling remains. Namely, map it with the WT (write-through) cache-type and hope for the best. arch_has_pmem_api() is updated to only indicate whether the arch provides the proper helpers to meet the minimum "writes are visible outside the cache hierarchy after memcpy_to_pmem() + wmb_pmem()". Code that cares whether wmb_pmem() actually flushes writes to pmem must now call arch_has_wmb_pmem() directly. Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> Reviewed-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> [hch: set ARCH_HAS_PMEM_API=n on x86_32] Reviewed-by: Christoph Hellwig <hch@xxxxxx> [toshi: x86_32 compile fixes] Signed-off-by: Toshi Kani <toshi.kani@xxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit e1455744b27c9e6115c3508a7b2902157c2c4347 Author: Dan Williams <dan.j.williams@xxxxxxxxx> Date: Thu Jul 30 17:57:47 2015 -0400 libnvdimm, pfn: 'struct page' provider infrastructure Implement the base infrastructure for libnvdimm PFN devices. Similar to BTT devices they take a namespace as a backing device and layer functionality on top. In this case the functionality is reserving space for an array of 'struct page' entries to be handed out through pfn_to_page(). For now this is just the basic libnvdimm-device-model for configuring the base PFN device. As the namespace claiming mechanism for PFN devices is mostly identical to BTT devices drivers/nvdimm/claim.c is created to house the common bits. Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 32ab0a3f51701cb37ab960635254d5f84ec3de0a Author: Dan Williams <dan.j.williams@xxxxxxxxx> Date: Sat Aug 1 02:16:37 2015 -0400 libnvdimm, pmem: 'struct page' for pmem Enable the pmem driver to handle PFN device instances. Attaching a pmem namespace to a pfn device triggers the driver to allocate and initialize struct page entries for pmem. Memory capacity for this allocation comes exclusively from RAM for now which is suitable for low PMEM to RAM ratios. This mechanism will be expanded later for setting an "allocate from PMEM" policy. Cc: Boaz Harrosh <boaz@xxxxxxxxxxxxx> Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> Cc: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> commit 004f1afbe199e6ab20805b95aefd83ccd24bc5c7 Author: Dan Williams <dan.j.williams@xxxxxxxxx> Date: Mon Aug 24 19:20:23 2015 -0400 libnvdimm, pmem: direct map legacy pmem by default The expectation is that the legacy / non-standard pmem discovery method (e820 type-12) will only ever be used to describe small quantities of persistent memory. Larger capacities will be described via the ACPI NFIT. When "allocate struct page from pmem" support is added this default policy can be overridden by assigning a legacy pmem namespace to a pfn device, however this would be only be necessary if a platform used the legacy mechanism to define a very large range. Cc: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> ��.n��������+%������w��{.n�����{�����ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f