Re: [PATCH v3 1/6] ANDROID: binder: add support for RT prio inheritance.

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Thu, 16 Nov 2017 16:10:01 +0100

On Thu, Nov 16, 2017 at 02:03:13PM +0100, Martijn Coenen wrote:
> On Thu, Nov 16, 2017 at 12:27 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> > But that's exactly what you get!? How would it ever get below
> > min_node_prio? PI only ever (temporarily) raises prio, it never lowers
> > it.
> 
> Ah I guess we have different definitions of inheritance then.

Well, I go by the one described in all the real-time computing texts;
also found on Wikipedia FWIW:

  https://en.wikipedia.org/wiki/Priority_inheritance

> This behavior is also related to binder's threadpool - all those
> threads run at a default prio of 120 (nice 0), and call into the
> kernel to wait for work. If somebody then makes a call with a lower
> prio (>120), we do change the binder thread that handles that call to
> use the same low prio as well. Why run at nice 0 if the caller runs at
> nice 10?

Not immediately saying it doesn't make sense in the sync-rpc scenario;
just saying that calling it PI is terribly confusing.

> Yes, in case of asynchronous transactions it's not PI at all.

It might be good to untangle some of that.

> > I'm not understanding; are you saying that some app does an async
> > request for sensor data, then later this sensor triggers and we need to
> > route the data back to the app?
> 
> The way this typically works is that an interested app creates a
> binder node to receive sensor data on (basically a callback). The app
> then passes this node into the Sensor HAL and says "call me back when
> you have relevant sensor data". When the Sensor HAL has data, it makes
> calls on this node with the relevant data. Because the receiving app
> process has a few generic binder threads that may receive any type of
> data, not just sensor data, we want to raise the priority only when
> handling sensor data, and restore it to the default when the thread is
> done.

I'm still struggling with the fact that _binder_ has threads and not the
apps themselves. Totally weird that.

But yes, that sounds roughly similar. App requests data through
subscription, event happens/gets delivered, app does something.

> > That smells like fail; if you need to raise your prio it means you are
> > low(er) prio and could at that point be subject to inversion.
> 
> True - it does expose a race condition, but it's still better than not
> raising the priority at all. I'm looking into selecting a busy thread
> and boost that. Though in general we don't run into this very often by
> sizing the threadpool large enough.

Right; but real-time systems are about guarantees, not about mostly and
on average.

But yes, you touch upon one way to have mixed priority thread pools. You
need a max prio receiver thread that is basically always runnable, this
is the one thread doing poll(). It is the thread you can immediately
assign your rt_mutex ownership to etc..

This thread then selects a victim thread to handle the received msg. In
case of no idle thread available, it will queue the msg (on priority
obviously) and assign ownership of the rt_mutex to any one random
thread.

This thread will then get boosted, complete the current work and find
the highest prio message to continue with.

And while this sounds pretty straight forward, it gets quite tricky when
you get a second message to queue etc.. IIRC you have always 'boost' the
lowest priority running thread, such that we end up guaranteeing to run
the N (size of thread pool) highest prio msgs or somesuch.

Ideally you can use dedicated threads for the RT part of the workload
and avoid all this pain.

> > Regular RPCs are synchronous; if A asks B to do something, A blocks. B
> > can then not turn around and ask A something. That is called a deadlock.
> 
> In case of binder, this actually works, we call it "recursive calls";
> the same thread in A that called into B can receive an incoming call
> from B and handle that. Of course you still need to be careful when
> that happens - eg if you hold a lock when making the first outbound
> call, you better make sure you can't take that same lock on any
> inbound call that can result from it. "Don't hold locks over
> synchronous binder calls" is a well-adapted Android design paradigm.

Sounds really weird to me though; I'd be curious to know why this
'feature' was created.

> >> What parts of the system aren't designed for RT?
> >
> > None of it as far I can tell. The shared thread pools, the quick fix
> > priority, the nested transaction stuff, those all reek of utter fail and
> > confusion.
> 
> Ack. FWIW I don't think it's really fail and confusion - I believe
> these were conscious design choices for binder and at the time I doubt
> that RT was even considered.

That's more or less what I was trying to say. The design never
considered RT and trying to retrofit RT into it makes for 'interesting'
things.

> Like every system binder has its warts, but it has worked very well
> for Android. The same holds for this patchset - it has very measurable
> improvements on Android system performance, and I don't think the code
> or behavior is incorrect. It's major weakness is that it has to work
> with the existing threadpool paradigm and therefore depends on some
> internal scheduler APIs.

I'm taking it changing this stuff is 'difficult' since much of it has
been directly exposed to apps? And you'll need to build a parallel
interface and slowly migrate apps away from it?

> Thanks for all the feedback. I'll do some more reading and talking
> internally. As far as this series goes: since all the things we talked
> about which really only affect Android, is your main concern with
> merging this the use of some internal scheduler APIs? 

Yeah I suppose so. Also I think the comments could be made clearer to
avoid some of the confusion we've had here.

> In general I'd really like to keep the upstream binder driver as close
> as possible to what we use in Android, and until we have an
> alternative for PI (if we can agree on something and build it), this
> is a pretty significant delta.

I understand.

_______________________________________________
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxx
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel