Livelock in handle_pte_fault [Was: Re: timerfd read does not return]

Stanislav Meduna <stano@xxxxxxxxxx> · Tue, 14 May 2013 10:31:09 +0200

On 13.05.2013 10:05, Stanislav Meduna wrote:

> 0d...0 62811.755394: function:  do_page_fault
> 0....0 62811.755396: function:     handle_mm_fault
> 0....0 62811.755398: function:        handle_pte_fault
> 0d...0 62811.755402: function:  do_page_fault
> 0....0 62811.755404: function:     handle_mm_fault
> 0....0 62811.755406: function:        handle_pte_fault

The flags in the pagefault handler are 0x28 - if I understand
it correctly, FAULT_FLAG_KILLABLE | FAULT_FLAG_ALLOW_RETRY.
The faulting address is indeed the one from stack that worked
for hours before, is mlockall()-ed and I have (of course)
no swap. I will add some code to print the content of
the offending pte.

The code in handle_pte_fault proceeds through the
  entry = pte_mkyoung(entry);
line and the following
  ptep_set_access_flags
returns zero. This repeats ad nauseum without anything run
in between. I will add some tracing prints to output
the content of the pte.

Adding flush_tlb_page(vma, address) at the beginning of
handle_pte_fault does not change anything.

The length of the hang could correlate with the time until
some SCHED_OTHER process is scheduled after the RT throttler
activates. There is a process running each 2 seconds and the
length of the hang is usually between 1 and 3 seconds. This
is not (yet) verified.

I am starting to think that the virtual memory mapping of the
process got somehow corrupted and is fixed at the next regular
context switch. There is no switch to other non-kernel process,
only to ksoftirqd, irq threads or other thread of the
same process afterwards. Shortly before there was some
switching between modprobe and kworker and sched_process_free
of kworker and modprobe in the RCU softirq.

The symptoms are similar to
http://lkml.indiana.edu/hypermail/linux/kernel/1103.0/01364.html

Regards
-- 
                                            Stano

--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html