Re: [PATCH v7 01/41] Documentation/x86: Add CET shadow stack description

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2023-03-06 1:05 p.m., Edgecombe, Rick P wrote:
> +Kan for shadow stack perf discussion.
> 
> On Mon, 2023-03-06 at 16:20 +0000, szabolcs.nagy@xxxxxxx wrote:
>> The 03/03/2023 22:35, Edgecombe, Rick P wrote:
>>> I think I slightly prefer the former arch_prctl() based solution
>>> for a
>>> few reasons:
>>>   - When you need to find the start or end of the shadow stack can
>>> you
>>> can just ask for it instead of searching. It can be faster and
>>> simpler.
>>>   - It saves 8 bytes of memory per shadow stack.
>>>
>>> If this turns out to be wrong and we want to do the marker solution
>>> much later at some point, the safest option would probably be to
>>> create
>>> new flags.
>>
>> i see two problems with a get bounds syscall:
>>
>> - syscall overhead.
>>
>> - discontinous shadow stack (e.g. alt shadow stack ends with a
>>   pointer to the interrupted thread shadow stack, so stack trace
>>   can continue there, except you don't know the bounds of that).
>>
>>> But just discussing this with HJ, can you share more on what the
>>> usage
>>> is? Like which backtracing operation specifically needs the marker?
>>> How
>>> much does it care about the ucontext case?
>>
>> it could be an option for perf or ptracers to sample the stack trace.
>>
>> in-process collection of stack trace for profiling or crash reporting
>> (e.g. when stack is corrupted) or cross checking stack integrity may
>> use it too.
>>
>> sometimes parsing /proc/self/smaps maybe enough, but the idea was to
>> enable light-weight backtrace collection in an async-signal-safe way.
>>
>> syscall overhead in case of frequent stack trace collection can be
>> avoided by caching (in tls) when ssp falls within the thread shadow
>> stack bounds. otherwise caching does not work as the shadow stack may
>> be reused (alt shadow stack or ucontext case).
>>
>> unfortunately i don't know if syscall overhead is actually a problem
>> (probably not) or if backtrace across signal handlers need to work
>> with alt shadow stack (i guess it should work for crash reporting).
> 
> There was a POC done of perf integration. I'm not too knowledgeable on
> perf, but the patch itself didn't need any new shadow stack bounds ABI.
> Since it was implemented in the kernel, it could just refer to the
> kernel's internal data for the thread's shadow stack bounds.
> 
> I asked about ucontext (similar to alt shadow stacks in regards to lack
> of bounds ABI), and apparently perf usually focuses on the thread
> stacks. Hopefully Kan can lend some more confidence to that assertion.

The POC perf patch I implemented tries to use the shadow stack to
replace the frame pointer to construct a callchain of a user space
thread. Yes, it's in the kernel, perf_callchain_user(). I don't think
the current X86 perf implementation handle the alt stack either. So the
kernel internal data for the thread's shadow stack bounds should be good
enough for the perf case.

Thanks,
Kan




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux