On Wed, 1 Feb 2017, John David Anglin wrote: > On 2017-02-01 3:10 PM, Mikulas Patocka wrote: > > > I'm not 100% convinced that 4.9 is fully stable and that the patch > > > is the reason for the crashes you see. > > > What kind of crashes do you see? Userspace or kernel ? > > Userspace crashes. Random crashes or internal errors in gcc when compiling > > the kernel. I once had "aptitude" crash. > The userspace crashes are present in 4.8 and 4.9 as well. For example, this > build failed due to an OS problem: > https://buildd.debian.org/status/fetch.php?pkg=kdenlive&arch=hppa&ver=16.12.1-2&stamp=1485956026&raw=0 > > Probably, 10% or more large packages fail to build because of this. Note that > this only occurs on machines > (e.g., c8000) that only support equivalent aliases. We don't see this on the > parisc buildd which has two PA8600 CPUs. > > My current theory is the following functions are buggy: > > /* vmap range flushes and invalidates. Architecturally, we don't need > * the invalidate, because the CPU should refuse to speculate once an > * area has been flushed, so invalidate is left empty */ > static inline void flush_kernel_vmap_range(void *vaddr, int size) > { > unsigned long start = (unsigned long)vaddr; > > flush_kernel_dcache_range_asm(start, start + size); > } > static inline void invalidate_kernel_vmap_range(void *vaddr, int size) > { > unsigned long start = (unsigned long)vaddr; > void *cursor = vaddr; > > for ( ; cursor < vaddr + size; cursor += PAGE_SIZE) { > struct page *page = vmalloc_to_page(cursor); > > if (test_and_clear_bit(PG_dcache_dirty, &page->flags)) > flush_kernel_dcache_page(page); > } > flush_kernel_dcache_range_asm(start, start + size); > } BTW. if you flush a cache line, then - according to the pa-risc specification - the page stays in the TLB and the CPU can fetch anything that is in the TLB speculatively. So, such a flush could really have no effect. The kernel should first flush TLB for the affected range and then flush the data using the tmpalias mapping. Mikulas > The kernel sets up a vmap range for I/O and we have non equivalent aliases to > the offset map > pages. I know the PG_dcache_dirty is never set when these routines are > called, so the for loop > does nothing. Nuking the whole data cache appears to fix the application > errors but my test > was cut short by a second problem. No one else seems to do anything with > offset map, so > we might have a parisc specific driver problem. > > We also have a down_read/up_read problem where applications stall forever and > are not killable > (D state in top). Some seemed related to signal processing but they have > occurred in other > situations as well. They seem more prevalent. For example, I can't remember > this happening > with 3.18 branch. This problem seems to be triggered by application tests > involving multiple > threads (glibc, gcc go and libgomp, and mariadb). > > Dave > > -- > John David Anglin dave.anglin@xxxxxxxx > -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html