Re: crashes in 4.10 because of "parisc: Enable KASLR"

Mikulas Patocka <mpatocka@xxxxxxxxxx> · Fri, 8 Dec 2017 06:22:38 -0500 (EST)

On Wed, 1 Feb 2017, John David Anglin wrote:

> On 2017-02-01 3:10 PM, Mikulas Patocka wrote:
> > > I'm not 100% convinced that 4.9 is fully stable and that the patch
> > > is the reason for the crashes you see.
> > > What kind of crashes do you see? Userspace or kernel ?
> > Userspace crashes. Random crashes or internal errors in gcc when compiling
> > the kernel. I once had "aptitude" crash.
> The userspace crashes are present in 4.8 and 4.9 as well.  For example, this
> build failed due to an OS problem:
> https://buildd.debian.org/status/fetch.php?pkg=kdenlive&arch=hppa&ver=16.12.1-2&stamp=1485956026&raw=0
> 
> Probably, 10% or more large packages fail to build because of this. Note that
> this only occurs on machines
> (e.g., c8000) that only support equivalent aliases.  We don't see this on the
> parisc buildd which has two PA8600 CPUs.
> 
> My current theory is the following functions are buggy:
> 
> /* vmap range flushes and invalidates.  Architecturally, we don't need
>  * the invalidate, because the CPU should refuse to speculate once an
>  * area has been flushed, so invalidate is left empty */
> static inline void flush_kernel_vmap_range(void *vaddr, int size)
> {
>         unsigned long start = (unsigned long)vaddr;
> 
>         flush_kernel_dcache_range_asm(start, start + size);
> }
> static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
> {
>         unsigned long start = (unsigned long)vaddr;
>         void *cursor = vaddr;
> 
>         for ( ; cursor < vaddr + size; cursor += PAGE_SIZE) {
>                 struct page *page = vmalloc_to_page(cursor);
> 
>                 if (test_and_clear_bit(PG_dcache_dirty, &page->flags))
>                         flush_kernel_dcache_page(page);
>         }
>         flush_kernel_dcache_range_asm(start, start + size);
> }

BTW. if you flush a cache line, then - according to the pa-risc 
specification - the page stays in the TLB and the CPU can fetch anything 
that is in the TLB speculatively. So, such a flush could really have no 
effect.

The kernel should first flush TLB for the affected range and then flush 
the data using the tmpalias mapping.

Mikulas

> The kernel sets up a vmap range for I/O and we have non equivalent aliases to
> the offset map
> pages.  I know the PG_dcache_dirty is never set when these routines are
> called, so the for loop
> does nothing.  Nuking the whole data cache appears to fix the application
> errors but my test
> was cut short by a second problem.  No one else seems to do anything with
> offset map, so
> we might have a parisc specific driver problem.
> 
> We also have a down_read/up_read problem where applications stall forever and
> are not killable
> (D state in top).  Some seemed related to signal processing but they have
> occurred in other
> situations as well.  They seem more prevalent.  For example, I can't remember
> this happening
> with 3.18 branch.  This problem seems to be triggered by application tests
> involving multiple
> threads (glibc, gcc go and libgomp, and mariadb).
> 
> Dave
> 
> -- 
> John David Anglin  dave.anglin@xxxxxxxx
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html