Re: spike in latency when calling popen() in the same process that runs a high priority RT thread

Josh Cartwright <josh.cartwright@xxxxxx> · Thu, 30 May 2013 13:02:43 -0500

On Fri, May 24, 2013 at 11:14:49AM -0500, Gratian Crisan wrote:
> We are seeing a fairly big spike in latency for an RT thread when
> another regular thread in the same process calls popen().  I have
> modified cyclictest (see the patch below) to spawn another thread that
> just does popen() in a loop and run cyclictest with the following
> options:
>
> ./cyclictest -m -S --policy=fifo -p 98
>
> Normally the max latency on our system is ~50uS. This is a dual core
> ARM Cortex-A9 running a 3.2.35-rt52 kernel. With this change it spikes
> over 500uS.
> This doesn't happen if the popen() test is run in a separate process
>
> We are looking for ideas on the root cause for this.

We've continued investigations here.

The following sequence is what happens when we see large latencies:

  RT thread:                   fork thread:
  sys_clock_nanosleep()
  ...
                               sys_fork()
                                dup_mm()
                                 dup_mmap()
                                  down_write(&oldmm->mmap_sem)
                                   for each vma, for each valid mapping:
                                    if (is_cow_mapping)
                                     mark pte wrprotect
  ...
  *woken*
  do_page_fault()
   down_read(&mm->mmap_sem)
    block. (boost fork thread prio)

COW mappings in the parent process are marked write-protect during the
fork().  In the unfortunate event that the RT thread wakes up and writes
to any region which is COWable while the page tables are still being
copied into the new process, it faults and then gets blocked on mmap_sem.

The time the RT thread spends sleeping on mmap_sem is proportional to
the number of existing page table mappings for the process.  If that
process makes use of mlockall() (say, cyclictest with -m), then many
valid mappings exist, so the copying can take a while.  Running without
mlockall() still shows the above condition being hit, but the impact is
quite a bit smaller due to there being fewer mappings at the time of
fork().

The solution we've gone with is to audit our codebase and remove any
usages of fork() from any code executing in the same process as our RT
threads.  In nearly all cases, vfork()/exec() has been a suitable
alternative that doesn't affect the high priority thread.

Would it be worth it for us to add a blurb about this on the RT wiki?

  Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html