On Fri, May 29, 2015 at 11:34 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: > On Fri, May 29, 2015 at 11:19 AM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote: >> On Fri, May 29, 2015 at 8:03 AM, Toshi Kani <toshi.kani@xxxxxx> wrote: >>> On Fri, 2015-05-29 at 07:43 -0700, Dan Williams wrote: >>>> On Fri, May 29, 2015 at 2:11 AM, Borislav Petkov <bp@xxxxxxxxx> wrote: >>>> > On Wed, May 27, 2015 at 09:19:04AM -0600, Toshi Kani wrote: >>>> >> The pmem driver maps NVDIMM with ioremap_nocache() as we cannot >>>> >> write back the contents of the CPU caches in case of a crash. >>>> >> >>>> >> This patch changes to use ioremap_wt(), which provides uncached >>>> >> writes but cached reads, for improving read performance. >>>> >> >>>> >> Signed-off-by: Toshi Kani <toshi.kani@xxxxxx> >>>> >> --- >>>> >> drivers/block/pmem.c | 4 ++-- >>>> >> 1 file changed, 2 insertions(+), 2 deletions(-) >>>> >> >>>> >> diff --git a/drivers/block/pmem.c b/drivers/block/pmem.c >>>> >> index eabf4a8..095dfaa 100644 >>>> >> --- a/drivers/block/pmem.c >>>> >> +++ b/drivers/block/pmem.c >>>> >> @@ -139,11 +139,11 @@ static struct pmem_device *pmem_alloc(struct device *dev, struct resource *res) >>>> >> } >>>> >> >>>> >> /* >>>> >> - * Map the memory as non-cachable, as we can't write back the contents >>>> >> + * Map the memory as write-through, as we can't write back the contents >>>> >> * of the CPU caches in case of a crash. >>>> >> */ >>>> >> err = -ENOMEM; >>>> >> - pmem->virt_addr = ioremap_nocache(pmem->phys_addr, pmem->size); >>>> >> + pmem->virt_addr = ioremap_wt(pmem->phys_addr, pmem->size); >>>> >> if (!pmem->virt_addr) >>>> >> goto out_release_region; >>>> > >>>> > Dan, Ross, what about this one? >>>> > >>>> > ACK to pick it up as a temporary solution? >>>> >>>> I see that is_new_memtype_allowed() is updated to disallow some >>>> combinations, but the manual seems to imply any mixing of memory types >>>> is unsupported. Which worries me even in the current code where we >>>> have uncached mappings in the driver, and potentially cached DAX >>>> mappings handed out to userspace. >>> >>> is_new_memtype_allowed() is not to allow some combinations of mixing of >>> memory types. When it is allowed, the requested type of ioremap_xxx() >>> is changed to match with the existing map type, so that mixing of memory >>> types does not happen. >> >> Yes, but now if the caller was expecting one memory type and gets >> another one that is something I think the driver would want to know. >> At a minimum I don't think we want to get emails about pmem driver >> performance problems when someone's platform is silently degrading WB >> to UC for example. >> >>> DAX uses vm_insert_mixed(), which does not even check the existing map >>> type to the physical address. >> >> Right, I think that's a problem... >> >>>> A general quibble separate from this patch is that we don't have a way >>>> of knowing if ioremap() will reject or change our requested memory >>>> type. Shouldn't the driver be explicitly requesting a known valid >>>> type in advance? >>> >>> I agree we need a solution here. >>> >>>> Lastly we now have the PMEM API patches from Ross out for review where >>>> he is assuming cached mappings with non-temporal writes: >>>> https://lists.01.org/pipermail/linux-nvdimm/2015-May/000929.html. >>>> This gives us WC semantics on writes which I believe has the nice >>>> property of reducing the number of write transactions to memory. >>>> Also, the numbers in the paper seem to be assuming DAX operation, but >>>> this ioremap_wt() is in the driver and typically behind a file system. >>>> Are the numbers relevant to that usage mode? >>> >>> I have not looked into the Ross's changes yet, but they do not seem to >>> replace the use of ioremap_nocache(). If his changes can use WB type >>> reliably, yes, we do not need a temporary solution of using ioremap_wt() >>> in this driver. >> >> Hmm, yes you're right, it seems those patches did not change the >> implementation to use ioremap_cache()... which happens to not be >> implemented on all architectures. I'll take a look. > > Whoa, there! Why would we use non-temporal stores to WB memory to > access persistent memory? I can see two reasons not to: > > 1. As far as I understand it, non-temporal stores to WT should have > almost identical performance. > > 2. Is there any actual architectural guarantee that it's safe to have > a WB mapping that we use like that? By my reading of the manual, > MOVNTDQA (as a write to pmem); SFENCE; PCOMMIT; SFENCE on uncached > memory should be guaranteed to do a durable write. On the other hand, > it's considerably less clear to me that the same sequence to WB memory > is safe -- aren't we supposed to stick a CLWB or CLFLUSHOPT in there, > too, on WB memory? In other words, is there any case in which > MOVNTDQA or similar acting on a WB mapping could result in a dirty > cache line? Depends, see the note in 10.4.6.2, "Some older CPU implementations (e.g., Pentium M) allowed addresses being written with a non-temporal store instruction to be updated in-place if the memory type was not WC and line was already in the cache." The expectation is that boot_cpu_has(X86_FEATURE_PCOMMIT) is false on such a cpu so we'll fallback to not using non-temporal stores. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>