On Wed, 2015-03-04 at 01:00 +0000, Andrew Morton wrote: > On Tue, 03 Mar 2015 16:14:32 -0700 Toshi Kani <toshi.kani@xxxxxx> wrote: > > > On Tue, 2015-03-03 at 14:44 -0800, Andrew Morton wrote: > > > On Tue, 3 Mar 2015 10:44:24 -0700 Toshi Kani <toshi.kani@xxxxxx> wrote: > > : > > > > + > > > > +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP > > > > +int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot) > > > > +{ > > > > + u8 mtrr; > > > > + > > > > + /* > > > > + * Do not use a huge page when the range is covered by non-WB type > > > > + * of MTRRs. > > > > + */ > > > > + mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE); > > > > + if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF)) > > > > + return 0; > > > > > > It would be good to notify the operator in some way when this happens. > > > Otherwise the kernel will run more slowly and there's no way of knowing > > > why. I guess slap a pr_info() in there. Or maybe pr_warn()? > > > > We only use 4KB mappings today, so this case will not make it run > > slowly, i.e. it will be the same as today. > > Yes, but it would be slower than it would be if the operator fixed the > mtrr settings! How do we let the operator know this? > > > Also, adding a message here > > can generate a lot of messages when MTRRs cover a large area. > > Really? This is only going to happen when a device driver requests a > huge io mapping, isn't it? That's rare. We could emit a warning, > return an error code and fall all the way back to the top-level ioremap > code which can then retry with 4k mappings. Or something similar - > somehow record the fact that this warning has been emitted or use > printk ratelimiting (bad option). Yes, an IO device with a huge MMIO space that is covered by MTRRs is a rare case. BIOS does not need to specify how MMIO of each card needs to be accessed with MTRRs (or BIOS should not do it since an MMIO address is configurable on each card). However, PCIe has the MMCONFIG space, PCIe config space, which is also memory mapped and must be accessed with UC. The PCI subsystem calls ioremap_nocache() to map the entire MMCONFIG space, which covers the PCIe config space of all possible cards. Here are boot messages on my test system. : PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xc0000000-0xcf ffffff] (base 0xc0000000) PCI: MMCONFIG at [mem 0xc0000000-0xcfffffff] reserved in E820 : And MTRRs cover this MMCONFIG space with UC to assure that the range is always accessed with UC. # cat /proc/mtrr reg00: base=0x0c0000000 ( 3072MB), size= 1024MB, count=1: uncachable So, if we add a message into the code, it will be displayed many times in this ioremap_nocache() call from PCI. Ideally, pud_set_huge() and pmd_set_huge() should allow using a huge page mapping when the entire map range is covered by a single MTRR entry, which is the case with MMCONFIG. But I did not include such handling into the patch because UC map is slow by itself, MMCONFIG is only accessed at boot-time, and mtrr_type_lookup() does not provide the level of info necessary. Thanks, -Toshi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>