On 07/21/2015 03:26 PM, Andy Lutomirski wrote:
On Tue, Jul 21, 2015 at 12:10 PM, Chris Metcalf <cmetcalf@xxxxxxxxxx> wrote:
So just for the sake of precision, the thing I'm talking about
is the lru_add_drain() call on kernel exit. Are you proposing
that we call that for every nohz_full core on kernel exit?
I'm not opposed to this, but I don't know if other nohz
developers feel like this is the right tradeoff.
I'm proposing either that we do that or that we arrange for other cpus
to be able to steal our LRU list while we're in RCU user/idle.
That seems challenging; there is a lot that has to be done in
lru_add_drain() and we may not want to do it for the "soft
isolation" mode Frederic alludes to in a later email. And, we
would have to add a bunch of locking to allow another process
to steal the list from under us, so that's not obviously going
to be a performance win in terms of the per-cpu page cache
for normal operations.
Perhaps there could be a lock taken that nohz_full processes
have to take just to exit from userspace, and that other tasks
could take to do things on behalf of the nohz_full process that
it thinks it can do locklessly. It gets complicated, since you'd
want to tie that to whether the nohz_full process was currently
in the kernel or not, so some kind of atomic update on the
context_tracking state or some such, perhaps. Still not really
clear if that overhead is worth it (both from a maintenance
point of view and the possible performance hit).
Limiting it just to the hard isolation mode seems like a good
answer since there we really know that userspace does not
care about the performance implications of kernel/userspace
transitions, and it doesn't cause slowdowns to anyone else.
For now I will bundle it in with my respin as part of the
"hard isolation" mode Frederic proposed.
Well, in principle if we accepted my proposed patch series
and then over time came to decide that it was reasonable
for nohz_full to have these complete cpu isolation
semantics, the one proposed ABI simply becomes a no-op.
So it's not as problematic an ABI as some.
What if we made it a debugfs thing instead of a prctl? Have a mode
where the system tries really hard to quiesce itself even at the cost
of performance.
No, since it's really a mode within an individual task that you'd
like to switch on and off depending on what the task is trying
to do - strict mode while it's running its main fast-path userspace
code, but certainly not strict mode during its setup, and possibly
leaving strict mode to run some kinds of slow-path, diagnostic,
or error-handling code.
--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html