On 5/17/22 16:26, John David Anglin wrote: > On 2022-05-17 9:24 a.m., Helge Deller wrote: >> On 5/17/22 15:19, Rolf Eike Beer wrote: >>> Am Montag, 16. Mai 2022, 23:49:10 CEST schrieb Helge Deller: >>>> On 5/16/22 23:28, Rolf Eike Beer wrote: >>>>> Am Montag, 16. Mai 2022, 17:14:47 CEST schrieb John David Anglin: >>>>>> Originally, I was convinced that we needed to use tmpalias flushes >>>>>> everwhere, for both user and kernel flushes. However, when I modified >>>>>> flush_kernel_dcache_page_addr, to use a tmpalias flush, my c8000 >>>>>> would crash quite early when booting. >>>>>> >>>>>> The PDC returns alias values of 0 for the icache and dcache. This >>>>>> indicates that either the alias boundary is greater than 16MB or >>>>>> equivalent aliasing doesn't work. I modified the tmpalias code to >>>>>> make it easy to try alternate boundaries. I tried boundaries up to >>>>>> 128MB but still kernel tmpalias flushes didn't work on c8000. >>>>>> >>>>>> This led me to conclude that tmpalias flushes don't work on PA8800 >>>>>> and PA8900 machines, and that we needed to flush directly using the >>>>>> virtual address of user and kernel pages. This is likely the major >>>>>> cause of instability on the c8000 and rp34xx machines. >>>>>> >>>>>> Flushing user pages requires doing a temporary context switch as we >>>>>> have to flush pages that don't belong to the current context. Further, >>>>>> we have to deal with pages that aren't present. If a page isn't >>>>>> present, the flush instructions fault on every line. >>>>>> >>>>>> Other code has been rearranged and simplified based on testing. For >>>>>> example, I introduced a flush_cache_dup_mm routine. flush_cache_mm >>>>>> and flush_cache_dup_mm differ in that flush_cache_mm calls >>>>>> purge_cache_pages and flush_cache_dup_mm calls flush_cache_pages. >>>>>> In some implementations, pdc is more efficient than fdc. Based on >>>>>> my testing, I don't believe there's any performance benefit on the >>>>>> c8000. >>>>>> >>>>>> V2: >>>>>> 1) Add flush_cache_page_check_pte routine. >>>>>> 2) Use it in copy_to_user_page and copy_from_user_page. >>>>>> 3) flush_anon_page moved to cache.c and updated. >>>>>> 4) Changed alias boundary to 64 MB for 64-bit kernels. Revised comment >>>>>> >>>>>> regarding alias boundary for PA8800/PA8900 processors. >>>>>> >>>>>> 5) Removed struct mm_struct * argument from flush_cache_pages. >>>>>> 6) Fixed thinko in flush_cache_range. It increased the number of pages >>>>>> >>>>>> flushed and slowed performance. >>>>>> >>>>>> 7) Removed sync changes from pacache.S. >>>>>> >>>>>> V3: >>>>>> 1) copy_to_user_page and copy_from_user_page moved to cache.c to >>>>>> >>>>>> improve inlining. >>>>>> >>>>>> 2) Replaced copy_user_page with copy_user_highpage. >>>>>> 3) Fixed cache threshold calculation on 32-bit kernels. >>>>>> 4) Don't warn on inequivalent private mappings in flush_dcache_page. >>>>>> 5) Return early from mm_total_size if size exceeds >>>>>> >>>>>> parisc_cache_flush_threshold. >>>>>> >>>>>> 6) Disable non-access TLB warning in handle_nadtlb_fault. Warning >>>>>> >>>>>> happens occassionally handling flushes for COW faults. >>>>>> >>>>>> 7) Remove flush_cache_dup_mm. >>>>>> 8) Flush entire cache in flush_cache_mm and flush_cache_range on >>>>>> >>>>>> processors with aliasing caches. Only flush small cache ranges >>>>>> on machines with PA8800/PA8900 processors. >>>>>> >>>>>> 9) Tested on rp3440, c8000 and c3750. >>>>> Given how long these changelogs are, and how fragile the whole caching is >>>>> I >>>>> think it is a good idea to split this patch into smaller ones, to improve >>>>> readability and being able to bisect it. >>>> FWIW, I've done some cleanups to this patch and committed it to my for-next >>>> tree. In case it's split up, please use the revised version. >>> Why did you modify get_ptep()? Until now it was just moved around in the file, >>> and IMHO it becomes less readable because all these needless variables are >>> batched up at the start of the function now. The only point I would see in >>> moving them all to the front is if there would be no nesting anymore, and >>> every condition was inverted: >> Dave's original patch did not moved the variables to the beginning. >> That change was me - just because checkpatch complained otherwise. >> >> I agree that it's less readable. > The get_ptep code was originally based on the vmalloc_to_page code in vmalloc.c. > vmalloc_to_page code. This code has now changed, so I think get_ptep needs updating. > See get_old_pud in mremap.c. It looks good to me. So, what's the consense of this discussion now? I can easily split out the pr_warn("WARNING"). Moving the get_ptep() back to the original place seems ok, and I'll keep the strange indenting which checkpatch want. Is that ok? Helge