Re: [RFC PATCH v7 1/7] Restartable sequences system call

Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> · Wed, 10 Aug 2016 16:47:42 +0000 (UTC)

----- On Aug 3, 2016, at 2:29 PM, Chris Lameter cl@xxxxxxxxx wrote:

> On Tue, 26 Jul 2016, Mathieu Desnoyers wrote:
> 
>> > What problem does this solve?
>>
>> It allows user-space to perform update operations on per-cpu data without
>> requiring heavy-weight atomic operations.
> 
> 
> This is great but seems to indicate that such a facility would be better
> for kernel code instread of user space code.

It would be interesting to eventually investigate whether rseq is
additionally useful for kernel code. It seems unrelated to its usefulness
for user-space code though.

Rseq for user-space only needs to hook into preemption and signal delivery,
which doesn't seem to have measurable effects on overall performance.

Doing rseq for kernel code would imply hooking into supplementary sites:

- preemption of kernel code (for atomicity wrt other threads). This would
  replace preempt_disable()/preempt_enable() critical sections touching
  per-cpu data shared with other threads. We would have to do the event_counter
  increment and ip fixup directly in the sched_out hook when preempting
  kernel code.
- possibly interrupt handlers (for atomicity wrt interrupts). This would
  replace local irq save/restore when touching per-cpu data shared with
  interrupt handlers. We would have to increment the event_counter and
  fixup on the pre-irq kernel frame.
- possibly NMI handlers (for atomicity wrt NMIs). This would replace
  preempt/irq off protected local atomic operations on per-cpu data
  shared with NMIs. We would have to increment the event_counter and
  fixup on the pre-NMI kernel frame.

Those supplementary hooks may add significant overall performance overhead,
so careful benchmarking would be required to figure out if it's worth it.

> 
>> First, prohibiting migration from user-space has been frowned upon
>> by scheduler developers for a long time, and I doubt this mindset will
>> change.
> 
> Note that the task isolation patchset from Chris Metcalf does something
> that goes a long way towards this. If you set strict isolation mode then
> the kernel will terminate the process or notify you if the scheduler
> becomes involved. In some way we are getting that as a side effect.

AFAIU, what you propose here is doable at the application design level.
We want to introduce rseq to speed up memory allocation, tracing, and
other uses of per-cpu data without having to modify the design of each
and every user-space applications out there.

> Also prohibiting migration is trivial form user space. Just do a taskset
> to a single cpu.

This is also possible if you can redesign user-space applications, but not
from a library perspective. Invoking system calls to change the affinity of
a thread at each and every critical section would kill performance. Setting
the affinity of a thread from a library on behalf of the application and
leaving it affined requires changes to the application design.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html