Re: [RFC PATCH for 5.8 3/4] rseq: Introduce RSEQ_FLAG_RELIABLE_CPU_ID

Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> · Tue, 7 Jul 2020 08:06:20 -0400 (EDT)

----- On Jul 7, 2020, at 7:32 AM, Florian Weimer fw@xxxxxxxxxxxxx wrote:

> * Mathieu Desnoyers:
> 
>> Those are very good points. One possibility we have would be to let
>> glibc do the rseq registration without the RSEQ_FLAG_RELIABLE_CPU_ID
>> flag. On kernels with the bug present, the cpu_id field is still good
>> enough for typical uses of sched_getcpu() which does not appear to
>> have a very strict correctness requirement on returning the right
>> cpu number.
>>
>> Then libraries and applications which require a reliable cpu_id
>> field could check this on their own by calling rseq with the
>> RSEQ_FLAG_RELIABLE_CPU_ID flag. This would not make the state more
>> complex in __rseq_abi, and let each rseq user decide about its own
>> fate: whether it uses rseq or keeps using an rseq-free fallback.
>>
>> I am still tempted to allow combining RSEQ_FLAG_REGISTER |
>> RSEQ_FLAG_RELIABLE_CPU_ID for applications which would not be using
>> glibc, and want to check this flag on thread registration.
> 
> Well, you could add a bug fix level field to the __rseq_abi variable.

Even though I initially planned to make the struct rseq_abi extensible,
the __rseq_abi variable ends up being fix-sized, so we need to be very
careful in choosing what we place in the remaining 12 bytes of padding.
I suspect we'd want to keep 8 bytes to express a pointer to an
"extended" structure.

I wonder if a bug fix level "version" is the right approach. We could
instead have a bitmask of fixes, which the application could independently
check. For instance, some applications may care about cpu_id field
reliability, and others not.

> Then applications could check if the kernel has the appropriate level
> of non-buggyness.  But the same thing could be useful for many other
> kernel interfaces, and I haven't seen such a fix level value for them.
> What makes rseq so special?

I guess my only answer is because I care as a user of the system call, and
what is a system call without users ? I have real applications which work
today with end users deploying them on various kernels, old and new, and I
want to take advantage of this system call to speed them up. However, if I
have to choose between speed and correctness (in other words, not crashing
a critical application), I will choose correctness. So if I cannot detect
that I can safely use the system call, it becomes pretty much useless even
for my own use-cases.

> It won't help with the present bug, but maybe we should add an rseq
> API sub-call that blocks future rseq registration for the thread.
> Then we can add a glibc tunable that flips off rseq reliably if people
> do not want to use it for some reason (and switch to the non-rseq
> fallback code instead).  But that's going to help with future bugs
> only.

I don't think it's needed. All I really need is to have _some_ way to
let lttng-ust or liburcu query whether the cpu_id field is reliable. This
state does not have to be made quickly accessible to other libraries,
nor does it have to be shared between libraries. It would allow each
user library or application to make its own mind on whether it can use
rseq or not.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com