On Thu, 19 Oct 2017, Matt Redfearn wrote: > On 18/10/17 21:34, Thomas Gleixner wrote: > > On Wed, 11 Oct 2017, Matt Redfearn wrote: > > > Secondly, the fixed min delta ignores the fact that with MIPS > > > multithreading active, execution resource within a core is shared > > > between the hardware threads within that core. An inconvenienly timed > > > switch of executing thread within gic_next_event, between the read and > > > write of updated count, can result in the CPU writing an event in the > > > past, and subsequently not receiving a tick interrupt until the counter > > > wraps. This stalls the CPU from the RCU scheduler. Other CPUs detect > > > this and print rcu_sched timeout messages in the kernel log. It can > > > lead to other issues as well if the CPU is holding locks or other > > > resources at the point at which it stalls. Fix this by scaling the min > > > delta for the timer based on the number of threads in the core > > > (smp_num_siblings). This accounts for the greater average runtime of > > > CPUs within a multithreading core. > > > > I don't understand why this is not catched by the check at the end of the > > next_event() function: > > > > res = ((int)(gic_read_count() - cnt) >= 0) ? -ETIME : 0; > > > > Btw, the local_irq_save() in this function is pointless as this function is > > always called with interrupts disabled from the core code. > > This is an issue because in some cases (hrtimer_reprogram -> > clockevents_program_event -> clockevents_program_min_delta, when > CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=n) there is no retry performed in the > case of -ETIME. There has been a patch pending for some time > https://patchwork.kernel.org/patch/8909491/ which ought to address this and > retry in the case of an event in the past on this call path. But in the > meantime this patch vastly improves the situation. I somehow missed that one. Care to repost so we get that solved at the place where it should be solved. Thanks, tglx