Re: [RFC PATCH v3 00/22] arm64: livepatch: Use ORC for dynamic frame pointer validation

"Madhavan T. Venkataraman" <madvenka@xxxxxxxxxxxxxxxxxxx> · Fri, 14 Apr 2023 23:14:32 -0500

On 4/13/23 13:15, Jose E. Marchesi wrote:
> 
>> On Thu, Mar 23, 2023 at 05:17:14PM +0000, Mark Rutland wrote:
>>> Hi Madhavan,
>>>
>>> At a high-level, I think this still falls afoul of our desire to not reverse
>>> engineer control flow from the binary, and so I do not think this is the right
>>> approach. I've expanded a bit on that below.
>>>
>>> I do think it would be nice to have *some* of the objtool changes, as I do
>>> think we will want to use objtool for some things in future (e.g. some
>>> build-time binary patching such as table sorting).
>>>
>>>> Problem
>>>> =======
>>>>
>>>> Objtool is complex and highly architecture-dependent. There are a lot of
>>>> different checks in objtool that all of the code in the kernel must pass
>>>> before livepatch can be enabled. If a check fails, it must be corrected
>>>> before we can proceed. Sometimes, the kernel code needs to be fixed.
>>>> Sometimes, it is a compiler bug that needs to be fixed. The challenge is
>>>> also to prove that all the work is complete for an architecture.
>>>>
>>>> As such, it presents a great challenge to enable livepatch for an
>>>> architecture.
>>>
>>> There's a more fundamental issue here in that objtool has to reverse-engineer
>>> control flow, and so even if the kernel code and compiled code generation is
>>> *perfect*, it's possible that objtool won't recognise the structure of the
>>> generated code, and won't be able to reverse-engineer the correct control flow.
>>>
>>> We've seen issues where objtool didn't understand jump tables, so support for
>>> that got disabled on x86. A key objection from the arm64 side is that we don't
>>> want to disable compile code generation strategies like this. Further, as
>>> compiles evolve, their code generation strategies will change, and it's likely
>>> there will be other cases that crop up. This is inherently fragile.
>>>
>>> The key objections from the arm64 side is that we don't want to
>>> reverse-engineer details from the binary, as this is complex, fragile, and
>>> unstable. This is why we've previously suggested that we should work with
>>> compiler folk to get what we need.
>>
>>> This still requires reverse-engineering the forward-edge control flow in order
>>> to compute those offets, so the same objections apply with this approach. I do
>>> not think this is the right approach.
>>>
>>> I would *strongly* prefer that we work with compiler folk to get the
>>> information that we need.
>>
>> IDK if it's relevant here, but I did see a commit go by to LLVM that
>> seemed to include such info in a custom ELF section (for the purposes of
>> improving fuzzing, IIUC). Maybe such an encoding scheme could be tested
>> to see if it's reliable or usable?
>> - https://github.com/llvm/llvm-project/commit/3e52c0926c22575d918e7ca8369522b986635cd3
>> - https://clang.llvm.org/docs/SanitizerCoverage.html#tracing-control-flow
>>
>>>
>>> [...]
>>>
>>>> 		FWIW, I have also compared the CFI I am generating with DWARF
>>>> 		information that the compiler generates. The CFIs match a
>>>> 		100% for Clang. In the case of gcc, the comparison fails
>>>> 		in 1.7% of the cases. I have analyzed those cases and found
>>>> 		the DWARF information generated by gcc is incorrect. The
>>>> 		ORC generated by my Objtool is correct.
>>>
>>>
>>> Have you reported this to the GCC folk, and can you give any examples?
>>> I'm sure they would be interested in fixing this, regardless of whether we end
>>> up using it.
>>
>> Yeah, at least a bug report is good. "See something, say something."
> 
> By all means, please.  If you guys report these issues on CFI
> divergences in the GCC bugzilla, we will look into fixing them.
> 
> https://gcc.gnu.org/bugzilla

I will try to get the data again and report the problems that I see.

Thanks.

Madhavan