On 13.05.2013 10:05, Stanislav Meduna wrote: > 0d...0 62811.755394: function: do_page_fault > 0....0 62811.755396: function: handle_mm_fault > 0....0 62811.755398: function: handle_pte_fault > 0d...0 62811.755402: function: do_page_fault > 0....0 62811.755404: function: handle_mm_fault > 0....0 62811.755406: function: handle_pte_fault The flags in the pagefault handler are 0x28 - if I understand it correctly, FAULT_FLAG_KILLABLE | FAULT_FLAG_ALLOW_RETRY. The faulting address is indeed the one from stack that worked for hours before, is mlockall()-ed and I have (of course) no swap. I will add some code to print the content of the offending pte. The code in handle_pte_fault proceeds through the entry = pte_mkyoung(entry); line and the following ptep_set_access_flags returns zero. This repeats ad nauseum without anything run in between. I will add some tracing prints to output the content of the pte. Adding flush_tlb_page(vma, address) at the beginning of handle_pte_fault does not change anything. The length of the hang could correlate with the time until some SCHED_OTHER process is scheduled after the RT throttler activates. There is a process running each 2 seconds and the length of the hang is usually between 1 and 3 seconds. This is not (yet) verified. I am starting to think that the virtual memory mapping of the process got somehow corrupted and is fixed at the next regular context switch. There is no switch to other non-kernel process, only to ksoftirqd, irq threads or other thread of the same process afterwards. Shortly before there was some switching between modprobe and kworker and sched_process_free of kworker and modprobe in the RCU softirq. The symptoms are similar to http://lkml.indiana.edu/hypermail/linux/kernel/1103.0/01364.html Regards -- Stano -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html