On 10/01/2013 10:24 AM, Ralf Baechle
wrote:
On Mon, Sep 30, 2013 at 11:13:45PM -0500, Steven J. Hill wrote:This patch fixes MIPS HIGHMEM for cache aliasing and non-DMA device I/O. It does the following: 1. Uses only colored page addresses while allocating by kmap*(), page address in HIGHMEM zone matches a kernel address by color. It allows easy re-allocation before flushing cache. It does it for kmap() and kmap_atomic(). 2. Fixes instruction cache flush by right color address via kmap_coherent() in case of I-cache aliasing. 3. Flushes D-cache before page is provided for process as I-page. It is required for swapped-in pages in case of non-DMA I/O. 4. Some optimization is done... more comes. Signed-off-by: Leonid Yegoshin <Leonid.Yegoshin@xxxxxxxxxx> Signed-off-by: Steven J. Hill <Steven.Hill@xxxxxxxxxx> (cherry picked from commit bf252e2d224256b3f34683bff141d6951b3cae6a) diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 17cc7ff..ed835b2 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -330,6 +330,7 @@ config MIPS_MALTA select SYS_SUPPORTS_MULTITHREADING select SYS_SUPPORTS_SMARTMIPS select SYS_SUPPORTS_ZBOOT + select SYS_SUPPORTS_HIGHMEM help This enables support for the MIPS Technologies Malta evaluation board. @@ -2064,6 +2065,7 @@ config CPU_R4400_WORKAROUNDS config HIGHMEM bool "High Memory Support" depends on 32BIT && CPU_SUPPORTS_HIGHMEM && SYS_SUPPORTS_HIGHMEM + depends on ( !SMP || NR_CPUS = 1 || NR_CPUS = 2 || NR_CPUS = 3 || NR_CPUS = 4 || NR_CPUS = 5 || NR_CPUS = 6 || NR_CPUS = 7 || NR_CPUS = 8 )Why does highmem not work for more than 8 CPU? Coloring requires much more kernel virtual address space, and kmap_atomic() uses space per-cpu. So, some trade-off is needed because kmap_atomic space may hit a top of kmap() space. NR_CPU == 8 is a good trade-off, at least for now. We may redesign it later but that would require splitting the idle task PTE table on per-cpu basis or some another means to separate PTE tables. Today each CPU kmap_atomic has a separate zone in kernel virtual address space. But this patch requires a multiplication by number of colours. +#ifndef cpu_has_ic_aliases +#define cpu_has_ic_aliases (cpu_data[0].icache.flags & MIPS_CACHE_ALIASES) +#endif #ifndef cpu_has_dc_aliases #define cpu_has_dc_aliases (cpu_data[0].dcache.flags & MIPS_CACHE_ALIASES) #endif diff --git a/arch/mips/include/asm/fixmap.h b/arch/mips/include/asm/fixmap.h index dfaaf49..a690f05 100644 --- a/arch/mips/include/asm/fixmap.h +++ b/arch/mips/include/asm/fixmap.h @@ -46,7 +46,19 @@ * fix-mapped? */ enum fixed_addresses { + +/* must be <= 8, last_pkmap_nr_arr[] is initialized to 8 elements, + keep the total L1 size <= 512KB with 4 ways */ +#ifdef CONFIG_PAGE_SIZE_64KB +#define FIX_N_COLOURS 2 +#endif +#ifdef CONFIG_PAGE_SIZE_32KB +#define FIX_N_COLOURS 4 +#endif +#ifndef FIX_N_COLOURS #define FIX_N_COLOURS 8 +#endifDark black magic alert? It is the same space problem - just limit the possible number of colors if CPU has a big L1. There is a panic() call in cache probe which verifies an actual number of required colors and L1 sizes and it is based on FIX_N_COLOURS. Normally a anything bigger than 16K page size isn't suffering from aliases at all, so can be considered having just a single page colour. Well, I don't believe HW engineers, so I would like to avoid that kind of conclusions :) At least I have on my desk some cores with 64KB L1 and still cache aliasing. + FIX_CMAP_BEGIN, #ifdef CONFIG_MIPS_MT_SMTC FIX_CMAP_END = FIX_CMAP_BEGIN + (FIX_N_COLOURS * NR_CPUS * 2), @@ -56,7 +68,7 @@ enum fixed_addresses { #ifdef CONFIG_HIGHMEM /* reserved pte's for temporary kernel mappings */ FIX_KMAP_BEGIN = FIX_CMAP_END + 1, - FIX_KMAP_END = FIX_KMAP_BEGIN+(KM_TYPE_NR*NR_CPUS)-1, + FIX_KMAP_END = FIX_KMAP_BEGIN+(8*NR_CPUS*FIX_N_COLOURS)-1,I don't understand why this would need to be extended for dealing with highmem cache aliases. Please explain. See above - kmap_atomic uses space per-cpu. I just replaced KM_TYPE_NR by maximum number 8 - it is a max stack level for kmap_atomic(). And multiplied by max number of colors. I use max number 8 because I suspect it never goes above 3 (I tried hard but never seen above 2). Anywhere, using KM_TYPE_NR is overkill now because lock is done today not by KM_TYPE but simple stack (see kmap_atomic_idx() function). Probably, it has sense to change in kmap_atomic_idx() BUG_ON(idx > KM_TYPE_NR); to BUG_ON(idx > 8); and replace it with symbolic, but patch was done originally on 2.6.32.15 and included a backported kmap_atomic_idx_push() and that verification was inadvertently changed. See above about 64KB aliasing cache. I hope for good future and 512KB L1... but it still may be aliasing.+/* 8 colors pages are here * +#ifdef CONFIG_PAGE_SIZE_4KB +#define LAST_PKMAP 4096 +#endif +#ifdef CONFIG_PAGE_SIZE_8KB +#define LAST_PKMAP 2048 +#endif +#ifdef CONFIG_PAGE_SIZE_16KB +#define LAST_PKMAP 1024 +#endif + +/* 32KB and 64KB pages should have 4 and 2 colors to keep space under control */Again I'm wondering why these large number of colors where you should need just one. Is this related to the 74K / 1074K erratum workaround we've recently been mailing about? No. diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c index 341609c..d7acabb 100644 --- a/arch/mips/mm/c-r4k.c +++ b/arch/mips/mm/c-r4k.c @@ -409,8 +409,11 @@ static inline void local_r4k_flush_cache_range(void * args) return; r4k_blast_dcache(); - if (exec) + if (exec) { + if (!cpu_has_ic_fills_f_dc) + wmb(); r4k_blast_icache(); + }Why the wmb() here? I think the implied barrier by the ERET before userland execution in that page can start I-cache refilling, should suffice? Well, after flushing L1D cache the CACHE instruction should be completed before we invalidate the same L1I cacheline with CACHE. The SYNC is needed and Arch Vol II (CACHE) explicitly states that: "For implementations which implement multiple level of caches without the inclusion property, the use of a SYNC instruction after the CACHE instruction is still needed whenever writeback data has to be resident in the next level of memory hierarchy". (The hazard barrier in ERET is needed too). HW team swears me that hazard barrier is not enough because D and I caches are independent from CPU and some race condition may occur here and I-cache may be filled before D-cache flush is completed. Hazard barrier just destroys core pipelines but it doesn't revert an effect of I-fetch into L1I forced by core prediction mechanics before hazard barrier. The newest cores from MIPS (I mean IMG :) with CM/CM2 don't need that because (from my records): "CM/CM2 doesn't delay the second cache flush (I-cache HInval). However, after Hazard barrier the instruction fetch on the same address as D-cache HWBInv will be delayed until it is finished." - it is a coherency feature of CM/CM2 which can be absent on other cores and SYNC is needed between two CACHEs. But this optimisation is not pushed yet to LMO. } static void r4k_flush_cache_range(struct vm_area_struct *vma, @@ -474,6 +477,7 @@ static inline void local_r4k_flush_cache_page(void *args) pmd_t *pmdp; pte_t *ptep; void *vaddr; + int dontflash = 0;I think you mean 'dontflush'. Yes, thank you.
Yes, it is right below: } if (vaddr) { @@ -532,6 +544,13 @@ static inline void local_r4k_flush_cache_page(void *args) else kunmap_atomic(vaddr); } + + /* in case of I-cache aliasing - blast it via coherent page */ + if (exec && cpu_has_ic_aliases && (!dontflash) && !map_coherent) {^^^^^^^^^^^^ Useless parenthesis.+ vaddr = kmap_coherent(page, addr); + r4k_blast_icache_page((unsigned long)vaddr); + kunmap_coherent(); + } } static void r4k_flush_cache_page(struct vm_area_struct *vma, @@ -544,6 +563,8 @@ static void r4k_flush_cache_page(struct vm_area_struct *vma, args.pfn = pfn; r4k_on_each_cpu(local_r4k_flush_cache_page, &args); + if (cpu_has_dc_aliases) + ClearPageDcacheDirty(pfn_to_page(pfn));This also is just a performance tweak? If so it should also go into a separate patch. No, it should be done for correctness. It is related with arch dirty bit control which was fixed too. Unfortunately, it was done 2.5 years ago and was merged with HIGHMEM patch because it is required for it. Sorry.
No, it is again a correctness in arch dirty bit handling.
SYNC is required, see above a note on L1D/L1I cache flush with Arch Vol II statement. Ralf - Leonid. |