Re: [PATCH v9 00/13] support "task_isolation" mode for nohz_full

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ping!  There has been no substantive feedback to this version of
the patch in the week since I posted it, which optimistically suggests
to me that people may be satisfied with it.  If that's true, Frederic,
I assume this would be pulled into your tree?

I have slightly updated the v9 patch series since this posting:

- Incorporated a fix to initialize cpu_isolation_mask early if no
  cpu_isolation= boot argument was given, to avoid crashing on
  CPUMASK_OFFSTACK platforms.

- Incorporated Mark Rutland's changes to convert arm64
  assembly to C code instead of using my own version.

The updated patch series is available in the branch at

git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git dataplane

I will post a v10 with those couple of small changes if I don't hear
any other feedback, or of course feel free to pull from the git repo.

On 01/04/2016 02:34 PM, Chris Metcalf wrote:
It has been a couple of months since the v8 version of this patch,
since various other priorities came up at work.  Since it's been
a while I will try to summarize where I think we got to on the
various issues that were raised with v8.

1. Andy Lutomirski raised the issue of whether it really made sense to
    only attempt to set up the conditions for task isolation, ask the kernel
    nicely for it, and then wait until it happened.  He wondered if a
    SCHED_ISOLATED class might be a helpful abstraction.  Steven Rostedt
    also suggested having an interface that would force everything else
    off a core to enable SCHED_ISOLATED to succeed.  Frederick added
    some concerns about enforcing the test that the process was in a
    good state to enter task isolation.

    I tried to address the different design philosphies for what I called
    the original "polite" mode and the reviewers' suggestions for an
    "aggressive" mode in this email:

    https://lkml.org/lkml/2015/10/26/625

    As I said there, on balance I think the "polite" option is still
    better.  Obviously folks are welcome to disagree and I'm happy to
    continue that conversation (or perhaps I convinced everyone).

2. Andy didn't like the idea of having a "STRICT" mode which
    delivered a signal to a process for violating the contract that it
    will promise to stay out of the kernel.  Gilad Ben Yossef argued that
    it made sense to have a way for the kernel to enforce the requested
    correctness guarantee of never being interrupted.  Andy pointed out
    that we should then really deliver such a signal when the kernel
    delivers an asynchronous interrupt to the core as well.  In particular
    this is a concern for the application-error case of a process that
    calls unmap() on one core while a thread on another core is running
    STRICT, and thus gets an unexpected TLB flush.

    This patch series addresses that concern by including support for
    IRQs, IPIs, and similar asynchronous interrupts to also send the
    STRICT signal to the process.  We don't try to send the signal if
    we are in an NMI, and instead just force a console backtrace like
    you would get in task_isolation_debug mode.

3. Frederick nack'ed my patch for a boot flag to disable the 1Hz
    periodic scheduler tick.

    I'm still hoping he's open to changing his mind about that, but in
    this patch series I have removed that boot flag.

Various other changes have been introduced since v8:

https://lkml.kernel.org/r/1445373372-6567-1-git-send-email-cmetcalf@xxxxxxxxxx

- Rebased to Linux 4.4-rc5.

- Since nohz_full and isolnodes have been separated back out again in
   4.4, I introduced a new task_isolation=MASK boot argument that sets
   both of them.  The task isolation support now requires that this
   boot flag have been used; it intentionally doesn't work if you've
   just enabled nohz_full and isolcpus separately.  I could be
   convinced that doing it the other way around makes sense, though.

- I folded the two STRICT mode patches together since there didn't
   seem to be much value in having the second patch that just enabled
   having a settable signal.  I also refactored the various routines
   that report on interrupts/exceptions/etc to make it easier to hook
   in from the case where we are interrupted asynchronously.

- For the debug support, I moved most of the functionality into
   kernel/isolation.c and out of kernel/sched/core.c, leaving only a
   small hook to handle mapping a remote cpu to a task struct safely.
   In addition to implementing Andy's suggestion of signalling a task
   when it is interrupted asynchronously, I also added a ratelimit
   hook so we won't spam the console if (for example) a timer interrupt
   runs amok - particularly since when this happens without ratelimit,
   it can end up self-perpetuating the timer interrupt.

- I added a task_isolation_debug_cpumask() helper function to check
   all the cpus in a mask to see if they are being interrupted
   inappropriately.

- I made the check for irq_enter() robust to architectures that
   have already entered user mode context_tracking before calling
   irq_enter() by testing user_mode(get_irq_regs()) instead of
   context_tracking_in_user(), and split out the code to a separate
   inlined function so I could comment it better.

- For arm64, I added a task_isolation_debug_cpumask() hook for
   smp_cross_call(), which I had missed in the earlier versions.

- I generalized the fix for tile to set up a clockevents hook for
   set_state_oneshot_stopped() to also apply to the arm_arch_timer,
   which I realized was showing the same problem.  For both cases,
   this seems to be what Viresh had in mind with commit 8fff52fd509345
   ("clockevents: Introduce CLOCK_EVT_STATE_ONESHOT_STOPPED state").

- For tile, I adopted the arm model of doing user_exit() calls in the
   early assembly code (a new patch in this series).  I also added a
   missing task_isolation_debug hook for tile's IPI and remote cache
   flush code.

Chris Metcalf (12):
   vmstat: add vmstat_idle function
   lru_add_drain_all: factor out lru_add_drain_needed
   task_isolation: add initial support
   task_isolation: support PR_TASK_ISOLATION_STRICT mode
   task_isolation: add debug boot flag
   arch/x86: enable task isolation functionality
   arch/arm64: adopt prepare_exit_to_usermode() model from x86
   arch/arm64: enable task isolation functionality
   arch/tile: adopt prepare_exit_to_usermode() model from x86
   arch/tile: move user_exit() to early kernel entry sequence
   arch/tile: enable task isolation functionality
   arm, tile: turn off timer tick for oneshot_stopped state

Christoph Lameter (1):
   vmstat: provide a function to quiet down the diff processing

  Documentation/kernel-parameters.txt  |  16 +++
  arch/arm64/include/asm/thread_info.h |  18 ++-
  arch/arm64/kernel/entry.S            |   6 +-
  arch/arm64/kernel/ptrace.c           |  12 +-
  arch/arm64/kernel/signal.c           |  35 ++++--
  arch/arm64/kernel/smp.c              |   2 +
  arch/arm64/mm/fault.c                |   4 +
  arch/tile/include/asm/processor.h    |   2 +-
  arch/tile/include/asm/thread_info.h  |   8 +-
  arch/tile/kernel/intvec_32.S         |  51 +++-----
  arch/tile/kernel/intvec_64.S         |  54 +++------
  arch/tile/kernel/process.c           |  83 +++++++------
  arch/tile/kernel/ptrace.c            |  19 +--
  arch/tile/kernel/single_step.c       |   8 +-
  arch/tile/kernel/smp.c               |  26 ++--
  arch/tile/kernel/time.c              |   1 +
  arch/tile/kernel/traps.c             |  13 +-
  arch/tile/kernel/unaligned.c         |  16 ++-
  arch/tile/mm/fault.c                 |   6 +-
  arch/tile/mm/homecache.c             |   2 +
  arch/x86/entry/common.c              |  10 +-
  arch/x86/kernel/traps.c              |   2 +
  arch/x86/mm/fault.c                  |   2 +
  drivers/clocksource/arm_arch_timer.c |   2 +
  include/linux/isolation.h            |  80 +++++++++++++
  include/linux/sched.h                |   3 +
  include/linux/swap.h                 |   1 +
  include/linux/vmstat.h               |   4 +
  include/uapi/linux/prctl.h           |   8 ++
  init/Kconfig                         |  20 ++++
  kernel/Makefile                      |   1 +
  kernel/irq_work.c                    |   5 +-
  kernel/isolation.c                   | 225 +++++++++++++++++++++++++++++++++++
  kernel/sched/core.c                  |  18 +++
  kernel/signal.c                      |   5 +
  kernel/smp.c                         |   6 +-
  kernel/softirq.c                     |  33 +++++
  kernel/sys.c                         |   9 ++
  mm/swap.c                            |  13 +-
  mm/vmstat.c                          |  24 ++++
  40 files changed, 665 insertions(+), 188 deletions(-)
  create mode 100644 include/linux/isolation.h
  create mode 100644 kernel/isolation.c


--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux