Can ROP Mitigation Measures Be Improved? [-fzero-call-used-regs=all]

Nan ZoE via Gcc-help <gcc-help@xxxxxxxxxxx> · Thu, 12 Oct 2023 17:24:19 +0800

Hello,

I conducted further experiments using the -fzero-call-used-regs=all
parameter in gcc-13.2.0 and delved deeper into the ROP mitigation
mechanisms implemented during the compilation phase of these programs.* I
aimed to identify shortcomings in these mitigation mechanisms and attempt
to improve them*. Below, I would like to continue our discussion on these
mitigation mechanisms.

Through this paper <https://ieeexplore.ieee.org/document/8445132>, we can
understand that the *-fzero-call-used-regs=all* parameter clears the values
of registers before each function returns. By observing the binary programs
compiled with this parameter, we noticed that almost every '*pop*'
instruction after each function turned into '*pxor*' instructions.
Additionally, when comparing the Gadget sets extracted by ropper from the
program before and after adding this parameter, we found that it
significantly reduced '*pop xxx; ret;*' style Gadgets and, *on average,
reduced the number of Gadgets in the program by 60%*. From an intuitive
perspective, clearing the register values before a function returns is a
simple and practical operation. *It reduces the number of Gadgets in the
program and prevents the leakage of register values upon program return*.
These protective measures increase the difficulty of constructing ROP for
attackers.

Such low-cost ROP mitigation mechanisms (compared to CFI) can be deployed
in programs on devices like IoT devices or network equipment. Many of these
devices prioritize rapid response, and their security measures are often
weakened. Therefore, they greatly benefit from adding such mitigation
measures at the compilation phase to enhance their security.

To observe the performance of this mitigation mechanism in more programs, I
collected dozens of mainstream open-source programs to construct a test
suite. I used the *-fzero-call-used-regs=all* compilation parameter to
generate *50 binary programs*, including service programs, software,
language interpreters, and critical lib libraries from the Linux operating
system, which we have made available in the *"**orig"* folder on Github
<https://anonymous.4open.science/r/roptest-benchmark-00F7/orig/fzero_params>.
We used ropper and ROPgadget tools to extract the Gadget sets for each
program and attempted to evaluate the ROP construction capabilities of each
Gadget set. For this purpose, we set an ROP construction target:
executing *execve("/bin/sh",
0, 0)*.  We injected a simple stack overflow vulnerability into each
program set in these experiments using *objcopy *tool , while preserving
all the original program's code to ensure the integrity of the original
Gadget collection. We placed the programs with injected vulnerabilities in
the *"**vuln"* folder on Github
<https://anonymous.4open.science/r/roptest-benchmark-00F7/vuln/fzero_params>,
with each program corresponding to a specific one in the "*orig*" folder.

To achieve this ROP, we first assessed the Gadget set's ability for
arbitrary address writes (i.e., setting memory values to "*/bin/sh*").
Secondly, we evaluated the capability to set the four key registers rdi,
rsi, rdx, and rax (then setting* rdi, rsi, rdx* as parameters and *rax *as
the system call number). Finally, it calls execve("/bin/sh", 0, 0) by "
*syscall" *Gadget. By the way, 1) we use injected vulnerabilities to
validate the correctness of ROP generation, and 2) if the addresses of
gadgets are influenced by code randomization, assuming we have already
achieved address leakage or bypassed address randomization by other means.

Out of the 50 programs, we successfully generated the target ROP payload
for *45 programs*, achieving a success rate of *90%*. As a side note, *48
programs* could construct an arbitrary *memory-write* ROP Goal payload. We
have placed the part of ROP payload script we constructed on GitHub
<https://anonymous.4open.science/r/roptest-benchmark-00F7/rop_example/fzero_params>
.

We manually analyzed the reasons for each program's failure to generate ROP
payloads. For most programs, the failures were due to the inability to
control certain registers. This limitation arose because the number of
Gadgets related to those registers was very limited, and the available
Gadgets were relatively complex. The range of controllable register values
was restricted. It's worth noting that, naturally, if the attack
requirements are lowered, such as reducing the need to set a particular
register value, more programs might succeed. Additionally, we analyzed the
reasons for the successful generation of ROP payloads in some programs,
which can be summarized as follows:

   1. The ROP mitigation mechanism significantly reduced the availability
   of Gadgets like "*pop xxx; ret;*" However, because *x86_64 *is not a
   fixed-size architecture, many non-aligned Gadgets can still be used. The
   *\xc3* bytecode is crucial for ret instructions.
   2. Some Gadgets that use mov and arithmetic instructions for data
   transfer only require simple calculations to set most target values.
   Furthermore, the setting of registers leads to a chain reaction. When we
   have the ability to set one register, it implies that we can set more
   registers based on that capability, and even memory values.
   3. Gadgets that involve memory read and write operations are also
   valuable. Although their usage requirements are higher, requiring control
   of more registers, once you can achieve arbitrary address reads or writes,
   it can create a chain reaction. Using a section of memory as an
   intermediary can assist in setting more register or memory values.
   4. Conditional branch Gadgets can also be utilized.
   5. Additionally, some special bytecodes have specific exploitation
   techniques, such as retf, retfq, ret n, etc. Their presence can increase
   the effectiveness of Gadgets by 30%.

After completing these experiments, I've been thinking about better
resisting ROP attacks during the compilation phase. The primary goal is to
make Gadgets more challenging to use. Of course, human capabilities are
strong, but our best approach is to make this task more difficult. I have
some ideas and'd like to know if they are feasible.

   1. *Reducing the number of Gadgets ending with 'ret'.* I have paid
   attention to the ROP mitigation measures
   <https://www.openbsd.org/papers/asiabsdcon2019-rop-paper.pdf> proposed
   by OpenBSD during the compilation phase. They found that the usage of 'rbx'
   is closely related to the '\xc3' bytecode (assembly instruction for 'ret').
   Therefore, they adjusted the priority of 'rbx' register usage,
   significantly reducing Gadgets ending with 'ret' in programs, and more
   importantly, they considered non-aligned Gadgets.
   2. *Increasing the number of data dependencies and side-effect fixes
   required for individual Gadgets*. Paying attention to some instructions
   before each jump instruction's bytecode, adding "redundant instructions"
   (or other methods) to make the Gadgets longer. This increases the number of
   data dependencies and side-effect fixes that must be satisfied for
   individual Gadgets. For example, consider the Gadget mov rdx, rax; mov
   qword ptr [rcx], rdx; test rax, rax; jne 0xdeadbeef; call [rbx + 0x30];. If
   we use it to perform the data transfer from 'rax' to 'rdx' (i.e., the 'mov
   rdx, rax;' instruction), then the other instructions in the Gadget (i.e.,
   'mov qword ptr [rcx], rdx; test rax, rax; jne 0xdeadbeef;') become side
   effects, and their jump addresses are controlled by memory, creating data
   dependencies (the 'call [rbx + 0x30];' instruction). These side effects and
   data dependencies need to be controlled within specific ranges to ensure
   that no crashes occur during Gadget execution.
   3. *Increasing the proportion of conditional branch Gadgets*. Analyze
   whether the bytecode of conditional branches is related to certain
   registers or instructions. Adjust the compilation scheme without affecting
   performance to increase the proportion of conditional branch Gadgets. Using
   a Gadget with conditional branches is relatively difficult as it requires
   considering at least three objects: setting the operation object value,
   satisfying the conditional branch, and controlling the direction of the
   jump branch in the corresponding branch. If the depth of conditional
   branches can also be increased, such Gadgets become even more challenging
   to use. Similarly, increasing the proportion of arithmetic operation
   instructions in Gadgets has significance in resisting ROP attacks.
   4. *Reducing the occurrence of critical bytecodes for certain Gadget
   exploitation techniques (retf, retfq, ret n)*.

Best regards,

ZoE