Re: Real-time projects that could use userspace RCU

Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> · Sun, 13 Jun 2010 16:51:19 -0400

* Paul E. McKenney (paulmck@xxxxxxxxxxxxxxxxxx) wrote:
> On Tue, May 11, 2010 at 10:43:37AM -0400, Mathieu Desnoyers wrote:
> > * Paul E. McKenney (paulmck@xxxxxxxxxxxxxxxxxx) wrote:
> > > On Tue, May 11, 2010 at 09:14:04AM -0400, Mathieu Desnoyers wrote:
> > > > * John Kacur (jkacur@xxxxxxxxxx) wrote:
> > > > > On Tue, May 11, 2010 at 2:21 PM, Mathieu Desnoyers
> > > > > <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
> > > > > > Hi Thomas,
> > > > > >
> > > > > > Paul told me you were quite interested in the userspace RCU library when he told
> > > > > > you about it (http://lttng.org/urcu). Do you have some userspace applications or
> > > > > > libraries with real-time needs in mind that could use it ? We could help moving
> > > > > > them to liburcu. The wait-free read-side is, as you certainly know, a
> > > > > > characteristic of RCU that can be very useful to RT applications.
> > > > > >
> > > > > > [CCing linux-rt-users, as it seems appropriate to ask them too.]
> > > > > >
> > > > > > Thanks,
> > > > > 
> > > > > Do you have any kind of benchmarks? If you had something appropriate
> > > > > we could add it to the rt-tests suite (which includes cyclictest). Not
> > > > > only would this provide an objective measure, but it could also act as
> > > > > a reference implementation for userspace programmers.
> > > > 
> > > > Yes, the library already has its set of benchmark test programs. The results can
> > > > be found in http://lttng.org/pub/thesis/desnoyers-dissertation-2009-12.pdf
> > > > section 6.5. It shows that RCU read-side is a few orders of magnitude faster
> > > > than lock-based approaches and scales linearly with the number of cores.
> > > > 
> > > > The same PDF, sections 7.6.2 and 7.6.3, presents the architecture-level modeling
> > > > of the RCU mb algorithm in Promela, along with the formal proof by model
> > > > checking for both correctness and progress (the read-side is proven wait-free).
> > > > 
> > > > > 
> > > > > See here.
> > > > > git clone git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git
> > > > > 
> > > > > cyclictest is the original program written by Thomas, maintained by
> > > > > Clark Williams now. Most - but not all, of the additional tests are
> > > > > modelled after this program, so you might want to have a look at that
> > > > > if you're not already familiar with it.
> > > > 
> > > > Thanks for the pointer, I did know about cyclictest, but not the others. Since
> > > > the read-side does not involve the OS nor blocking, I wonder which of these
> > > > tests would be even a near-match though.
> > > 
> > > Why not add mutual-exclusion tests, including locking, per-thread locking,
> > > reader-writer locking, and RCU?  The figure of merit would be maximum
> > > latency rather than throughput, but the existing userspace-rcu tests should
> > > be pretty close.
> > > 
> > 
> > Do you mean adding our RCU tests to the rt-tests.git tree or adding more
> > information in our own tests ? Also, the maximum latency is quite dependent on
> > the rest of the workload running on the system, so we might have to generate
> > such a workload while the test runs to give an interesting and accurate view of
> > the maximum latency.
> > 
> > Maybe running one (or many) of the already existing rt-tests in parallel would
> > do.
> > 
> > Thoughts ?
> 
> My thought was a variant of our existing RCU tests.  Something like:
> 
> 	clock_gettime();
> 	pthread_mutex_lock();
> 	clock_gettime();
> 	/* compute latency, accumulate average and maximum */
> 
> The test thread would need to have real-time priority.  Then print the
> maximums for various mechanisms.
> 
> Does this seem like a reasonable approach?

(sorry for delayed answer, I've been deeply focusing on ring buffer
implementation lately)

Hrm, the only thing I'm afraid of is that RT latency tests is quite different
from throughput benchmarks. Basically, for RCU throughput benchmark, just
creating a few simple tests is fine. However, in the case of RT behavior
testing, creating this test thread is just one part of the equation. Quickly
reproducing conditions that can lead to priority inversions in an automated way
should also be part of the picture.

In addition, we might want to measure the overall time it takes to get the lock,
execute the C.S. and release the lock, rather that just assuming that only the
"lock" part matters. Some "clever" token-based fair locking scheme can have a
more evolved unlock primitive for instance.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html