On Mon, Jun 4, 2018 at 11:08 AM, Luck, Tony <tony.luck@xxxxxxxxx> wrote: > On Mon, Jun 04, 2018 at 10:39:48AM -0700, Dan Williams wrote: >> On Mon, Jun 4, 2018 at 10:08 AM, Luck, Tony <tony.luck@xxxxxxxxx> wrote: >> > On Sat, Jun 02, 2018 at 10:23:20PM -0700, Dan Williams wrote: >> >> +static inline int set_mce_nospec(unsigned long pfn) >> >> +{ >> >> + int rc; >> >> + >> >> + rc = set_memory_uc((unsigned long) __va(PFN_PHYS(pfn)), 1); >> > >> > You should really do the decoy_addr thing here that I had in mce_unmap_kpfn(). >> > Putting the virtual address of the page you mustn't accidentally prefetch >> > from into a register is a pretty good way to make sure that the processor >> > does do a prefetch. >> >> Maybe I'm misreading, but doesn't that make the page completely >> inaccessible? We still want to read pmem through the driver and the >> linear mapping with memcpy_mcsafe(). Alternatively I could just drop >> this patch and setup a private / alias mapping for the pmem driver to >> use. It seems aliased mappings would be the safer option, but I want >> to make sure I've comprehended your suggestion correctly? > > I'm OK with the call to set_memory_uc() to make this uncacheable > instead of set_memory_np() to make it inaccessible. > > The problem is how to achieve that. > > The result of __va(PFN_PHYS(pfn) is the virtual address where the poison > page is currently mapped into the kernel. That value gets put into > register %rdi to make the call to set_memory_uc() (which goes on to > call a bunch of other functions passing the virtual address along > the way). > > Now imagine an impatient super-speculative processor is waiting for > some result to decide where to jump next, and picks a path that isn't > going to be taken ... out in the weeds somewhere it runs into: > > movzbl (%rdi), %eax > > Oops ... now you just read from the address you were trying to > avoid. So we log an error. Eventually the speculation gets sorted > out and the processor knows not to signal a machine check. But the > log is sitting in a machine check bank waiting to cause an overflow > if we try to log a second error. > > The decoy_addr trick in mce_unmap_kpfn() throws in the high bit > to the address passed. The set_memory_np() code (and I assume the > set_memory_uc()) code ignores it, but it means any stray speculative > access won't point at the poison page. > > -Tony > > Note: this is *mostly* a problem if the poison is in the first > cache line of the page. But you could hit other lines if the > instruction you speculatively ran into had the right offset. E.g. > to hit the third line: > > movzbl 128(%rdi), %eax Ok, makes sense and I do see now that this decoy resolves to the same physical address once PTE_PFN_MASK is applied when we start messing with page tables. However, set_memory_uc() is currently not prepared for this trick as it specifies the unmasked physical address to reserve_memtype(). ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE, _PAGE_CACHE_MODE_UC_MINUS, NULL); ...compared to set_memory_np() which does not manipulate the memtype tracking. I'll fix up reserve_memtype() and free_memtype() to be prepared for decoy addresses.