Re: PointGuard: It's not the Size of the Buffer, it's the Address

pageexec@freemail.hu · Tue, 19 Aug 2003 18:54:24 +0200

> >You are wrong (and even self-contradicting) here, in any case, so-called
> >information leaking can happen without having to corrupt pointers ([1],
> >[2]). Also, section 3.4.3 sublates the above.
> >
> It is true that PointGuard raises new issues with regard to information 
> leakage: before PointGuard, there was not much significance to leaking 
> pointer values from a running process, and so this now becomes a new 
> threat that needs study.

You are wrong, PG is not the one that raised the issue of information
leaking as it has been known for quite some time now [1] (i would say
ever since randomization schemes were implemented, although apparently
not everyone knew of them).

> Yes, PointGuard only protects pointer values generated by code compiled 
> with PointGuard.

In other words, does this mean that PG as you described it in your paper
is vulnerable to the most basic form of exploit techniques - shellcode
injection and execution?

> We are modifying the dynamic linker for Immunix. But that kind of 
> hacking isn't worthy of a paper, so we omitted it.

I did not expect a full paper about ld.so hacking but you should have
at least mentioned the very fact that you did need to do it - after all
it is part of the PG system. Also i would like to point out that at least
on i386 the PLT is generated by the normal linker (ld in binutils), so
you will have to modify it as well.

> > It would also be interesting to know how you can
> >handle the saved program counter and frame pointer just after the AST
> >level where as far as i know these entities do not even exist (and
> >hence cannot be manipulated/controlled there).
> >
> As the paper said, we are going to tag the AST expressions so that 
> spills are PG-encrypted, but this is not yet implemented.

I was not asking about register spills (which hold local variables
described at the C language level), i meant the CPU specific registers
that are used for addressing local variables (frame pointer, EBP on
i386) and control flow (instruction pointer or program counter, EIP on
i386). As far as i know, these registers are not visible at the AST
level, they are not (directly) controlled by any C language level
construct therefore encrypting/decrypting them requires special
changes - what shall they be?

> >   2). It also begs the question of what kind of performance impact PG
> >   will have once all these omissions are rectified (more on your
> >   performance evaluation below).
> >
> The only pointer load/stores that are not encrypted right now are 
> register spills. That is a rare case, so it will not affect performance 
> much.

If it is true that PG does not protect the saved instruction and frame
pointers then indeed the performance impact of the above changes will
be irrelevant - all an attacker has to do is simply use the good old
way of shellcode injection/execution without having to worry about PG
encrypted pointers. If you are going to encrypt these then i am not
convinced that the resulting impact will be so little (see [2]).

> > Also what happens
> >   with functions that take format strings and hence accept arguments of
> >   variable types (i.e., pointers and non-pointers), do you parse such
> >   format strings and convert the pointer arguments accordingly or do
> >   you turn off PG altogether for such code?
> >
> There is special case handling for varargs.

I understood that, i was asking about its technical details. Consider the
following code:

union bar {
   char * y; /* char @ y; */
   long z;
};

void foo(int x, const char * f, union bar * s, int y) {
   if (x)
     printf(f, s->y[0]);
   else
     printf(f, s->z);
}

Here foo() is PG protected so 'f' and 's' are encrypted. The question
is: what happens to s->y ? Is it supposed to be marked with '@' (maybe
along with s itself) and hence be used unencrypted throughout the code
(and leave it vulnerable to overflows)? If not, when will it be
decrypted? The problem with this is that the compiler cannot know in
advance whether the format string 'f' will contain a %c or a %ld
specifier, so neither blind decryption (of s->y) nor leaving it
(s->z) untouched will produce the expected result (which is defined
to be the result of non-PG compiled code).

> That is correct: unencrypted pointers are passed into the kernel.

This means that such pointers (passed as arguments on the stack) do
leak on the stack (beyond the register spills you mentioned).

> or to have the kernel know the key value of all processes and do
> the mapping for you (which is feasible, but more intrusive than
> just hacking glibc).

Did you consider building the kernel itself with PG? In that case
sharing the (per task) encryption key between the kernel and userland
could be automatic.

> >   Finally i am wondering how you plan to implement pointer mode tracking
> >   in the compiler, or more precisely, why you have to do it in the compiler
> >   only and not at runtime (in the latter case you would have to extend the
> >   pointer representation and open a whole can of worms).
> >
> I have no idea what you are talking about.

Refer to my small example above and imagine that you encrypt varargs
pointer arguments as well. As i pointed it out above, you cannot decide
at compile time whether to do the decryption or not hence you would
have to carry the pointer's mode into the runtime which means you will
have to augment the internal pointer type which increases its size.

> >6. In section 5 you admit that you do not indeed have a PG protected
> >   glibc and hence heap pointers are not protected at all, this calls
> >   into question the seriousness of your security and performance
> >   testing (especially since you compare your results to mature
> >   solutions which cannot be said of PG yet).
> >
> All of the code used in our performance testing was statically linked 
> and compiled with PointGuard to work around the absence of a PG version 
> of glibc, so the performance figures are valid.

I think i am not following you here. Why does statically linking
in glibc change the fact that glibc is not protected with PG (and
neither are heap pointers as a consequence)? By the way, can you
please make your 'straw man' vulnerable binary (the one shown in
figure 12 in your paper) available, i would like to take a look
at it.

> >   "2. Usefully corrupting a pointer requires pointing it at a
> >    specific location."
> >
> >This is false, the hijacked pointer may very well point to a set of
> >specific values (e.g. any GOT entry that is used later, any member of
> >a linked list, etc).
> >
> Bull: you just specified a specific location that happens to be a range. 

Sorry, i am not a native speaker and have misunderstood your use of
'specific location' as 'fixed/known address'. However this does not
change the fact that such a pointer can still be successfully attacked
without knowing the full encryption key, see below.

> A very small range in the size of an address space. Unlike PaX/ASLR 
> (which can only jiggle objects a little within a range)

What is your definition of 'little' here? ASLR randomizes the main
executable, the brk() heap, the libraries (mmap()) and the stack in
separate 256 MB ranges (the main executable and the brk() regions
do overlap somewhat). The randomness within these ranges is roughly
16/24/16/24 bits (all this on 32 bit architectures, and more on
64 bit ones).

> PointGuard has complete freedom to randomize all 32 bits of the
> pointer, so the fact that you can craft an exploit that can only
> approximately hit a target does not affect PointGuard.

Even if PG randomizes all 32 bits it is vulnerable to partial pointer
overwrites as described in [3].

>     * PointGuard provides better randomization than ASLR, because the
>       randomization ranges are much greater.

Attacking ASLR means that one needs to know addresses from more than one
(differently) randomized region (e.g., a code address and a stack/heap
address because one is forced to do a return to libc style attack), this
quickly adds up (e.g., up to 40 bits in the mentioned case) as generally
one cannot brute force these values independently.

> >   "3. Under PointGuard protection, a pointer cannot be corrupted
> >    to point to a specific location without knowing the secret key."
> >
> >This is correct provided the implementation is bug-free - something
> >that cannot be verified until you actually release PG.
> >
> I have no idea what you are talking about. If the pointer is hashed, you 
> *cannot* usefully corrupt it without knowing the secret key.

I can, it is called partial pointer overwrite and can be useful if your
'specific location' you want to aim at is a range. Imagine that in the
future PG will encrypt the saved program counter as well and you have
a large enough overflowable stack buffer - overwriting the least
significant byte of the saved program counter will transfer control
within a 256 byte range of the original return place (or a 64k range
for a 2 byte overwrite). If there is any byte sequence there that does
something like 'jmp register' and the register happens to hold a value
pointing back to the buffer (it can happen since you do not reload all
registers on function return therefore a plaintext pointer can leak
back to the caller) you will execute shellcode. Even a non-executable
stack is just a workaround here, you may very well have a plaintext
heap pointer in the register (which in fact you do since PG does not
handle them yet) and return there if you can ensure that there is a
copy of the shellcode there.

> Speculating that any piece of software has bugs without foundation
> boarders on FUD,

It is not FUD, it is a simple reminder that unless your work is made
public, it cannot be evaluated let alone trusted for its correctness.
I will remind you that in [4] you stated that:

  "We describe StackGuard: a simple compiler technique
   that virtually eliminates buffer overflow vulnerabilities
   with only modest performance penalties."

As we know this did not prove to be true, StackGuard was found to be
circumventable ([5], [6]). I personally believe that the more eyes
can scrutinize a system, the lower its error rate will be. It is in
your best interest to make your work available and wait for judgement
until others have looked over it - you did not prove to be infallible
in the past, why would you have created a bug-free system this time
(especially considering how much more complex it is to hack the AST
than function prologue/epilogue generation)? For now i would even
appreciate just taking a look at PG compiled binaries at the
assembly level, that would tell me what is exactly randomized
and how vulnerable PG is.

> but in this case it isn't even possible: an encrypted pointer cannot be 
> modified by a plaintext overflow. A bug that accidentally laid a 
> plaintext pointer would result in a crash when the value is decrypted, 
> and vice versa: the design specifically resists this problem.

Three words: partial pointer overwrite.

> Speculate away as to what PointGuard will do when we're done integrating 
> it. On second thought, don't: you've done more than enough flaming 
> speculation today :)

It seems to be that my only mistake was an unfortunate consequence
of my language skills (or lack thereof), otherwise you have only
confirmed my 'speculations'. In any case, i look forward for trying
out PG once it is available (or at least your test binaries).

> >   Third, there is related work ([4] and [5], all of which predates PG
> >   by years and you failed to reference) that appears to show more real
> >   performance impact of function pointer encryption (something PG does
> >   not seem to do yet universally).
> >
> That work is in fact based directly on PointGuard, having resulted from 
> this post http://lwn.net/1999/1111/a/stackguard.html

Which of the two works are you referring to here?

> And you're on crack if you think their performance results are more 
> realistic: the only "pointer" they encrypt is the activation return 
> address. *None* of the hard work of weaving pointer encryption into the 
> compiler's type system was done. They published first because we chose 
> deliberately to not publish an empty idea with no implementation.

Assuming you are talking about [5] above, i am at a loss to interpret
your words ;-). Their performance results indicate much bigger (yes,
positive, i.e., slowdown) impact than what you have reported in your
PG paper. Now, if you think they encrypt even less than PG (so far you
seem to confirm that PG does *not* in fact encrypt the program counter),
then one would expect that your numbers would be even worse than theirs
(i.e., even more slowdown and definitely no speed-up). Since you imply
that their numbers are less realistic than yours (i hope you did not
seriously expect a speed-up from them), you can draw the consequence
that your numbers are even less realistic than theirs. Which is what
i have been saying myself. Also do you realize that the 'only' pointer
they encrypt happens to be the most vulnerable for execution flow
redirection style attacks as well?

> >9. In section 7.1 you say that:
> >whereas you admit before that PG requires programmer intervention (as it
> >is not possible to have a pure PG system right now), i doubt a programmer
> >can compile (port) millions of lines of code in a day.
> >
> You are entitled to your opinion on the numbers and magnitudes, but it 
> is inescapable that "porting" to PointGuard is far less work than 
> porting from C to Cyclone or CCured. So what's your point?

The point is that augmenting existing code for PG use consumes time
(since you do not have a full PG system) and you have shown no data
that would back up your claim. When PG becomes able to recompile the
entire userland without any intervention then you can claim your
'millions of lines of code in a day'.

> >Where is this "exec(sh)" supposed to be 'almost always'? Can you
> > substantiate this claim?
> >
> It is in glibc, and most programs link to glibc. This is very well 
> known, and I didn't think it needed to be justified.

Last time i checked, glibc did not contain any exec(sh) code. What it
does contain is system() which will invoke 'sh -c' and exit if you
fail to pass a proper argument and the various wrappers around the
execve() system call that will also fail if you do not pass the proper
arguments. The point i am making here is that despite a handy execve()
in glibc you also have to be able to provide arguments (pointers) before
you can (ab)use it in an exploit. Therefore randomization (ASLR in PaX)
prevents this kind of exploiting in a probabilistic sense (minus
information leaking) and you should have pointed out that composing
various techniques (such as those in PaX) provides more protection
than using them standalone - it is an important part of the PaX
philosophy, you will see it once you read the documentation.

> I have been *trying* to properly cite PaX in various papers for at least 
> a year, but you don't make it easy. A web URL is not normally considered 
> a suitable citation.

Is it not? Then how do you explain that you provided web URLs for projects
like Solar Designer's kernel patch?

> research community is unaware of PaX. I dare say that the PointGuard 
> paper will do more to raise PaX visibility in the research community 
> than anything before. That was deliberate, because IMHO PaX is 
> under-exposed: it's good work, and few have heard of it.

Shall i also thank google then for helping out all the lost souls who
otherwise would not be able to find the PaX webpage because for some
yet to be explained reason you had failed to provide it yourself?

> >I am curious to learn why you cited this information
> >when you have already been made aware of the current situation ([13]).
> >
> It was hearsay. Publish something, and I'll cite it. Please.

Is hearsay what can be published in a paper accepted at a 'strong
refereed conference' then? How about you test it out yourself? You
have the source code at your disposal after all. I also doubt that
you would cite anything i write considering how you failed to do so
in the past, even after you have been made aware of the documentation
(and which you had apparently failed to read). In any case, a separate
documentation about PaX performance is not out of question, but i
would rather do it when all architectures can be properly evaluated
(right now due to some crappy userland code RISC architectures need
special emulation that i will get rid of eventually).

> Go look up the word "dual": it is a mathematical term. What you're 
> saying is exactly the same as what I am saying.

Ok, i did and according to [7]:

  "Every field of mathematics has a different meaning of dual."

Oops.

  "Loosely, where there is some binary symmetry of a theory, the image
   of what you look at normally under this symmetry is referred to as
   the dual of your normal things."

I cannot really say that i got what you really meant by 'dual' but
let's assume we were really meaning the same thing ;-).

> >This is misleading because Address Obfuscation is vulnerable to the exact
> >same information leaking problem as ASLR or PG, otherwise an attacker has
> >to guess addresses (if he needs any, that is), there is no (determinisctic)
> >way around that.
> >
> It is *your job*, not mine, to go write a paper explaining how PaX/ASLR 
> is better than Sekar et al.

Actually, it is not my job (more below), i was merely pointing out that
your comparison table has more flaws beyond PaX.

> In the absence of such a paper, I'm having to guess at the differences,
> in a very small portion of my paper.

Why do you have to guess at the differences and why did i not have to?
I can assure you that we have access to the exact same information:
the AO paper and the PaX source code and documentation (and i am sure
that you could have asked questions from either party should you have
needed it).

> I vigorously encourage you to go write a real paper and 
> submit it to a strong refereed conference such as USENIX Security. 

Thank you but i have other ideas about a 'strong refereed conference'.
Besides, i am not into academia, my interests lie elsewhere.

> Had you done this two years ago, you would not be having this silly
> flame war over W^X with Theo.

Thanks for the advice but i already saw in this very thread ([8])
where publishing papers in this area leads to, i do not think i
need any more of it.

[1] http://www.phrack.org/show.php?p=58&a=4
[2] http://link.springer.de/link/service/series/0558/bibs/2513/25130025.htm
[3] http://www.phrack.org/show.php?p=59&a=9
[4] http://immunix.org/StackGuard/usenixsc98.pdf
[5] http://www.phrack.org/show.php?p=56&a=5
[6] http://www1.corest.com/common/showdoc.php?idx=242&idxseccion=11
[7] http://dictionary.reference.com/search?q=dual
[8] http://marc.theaimsgroup.com/?l=bugtraq&m=106124676623652&w=2