Re: http://www.smashguard.org

Nicholas Weaver <nweaver@CS.berkeley.edu> · Tue, 3 Feb 2004 11:01:53 -0800

On Tue, Feb 03, 2004 at 07:36:55AM -0500, Dave Paris composed:
> I'm not sure I understand the economics involved here.  Taking the
> worst-case (software) cited at an 8.3% performance hit, this says a 3.2GHz
> P4 will give approximately the same performance as a 2.9GHz machine.  Or put
> another way, for every 12 machines I have operating on a problem (say, in a
> cluster of some sort), I have to add in one additional machine to make up
> for the performance hit.  If we're talking about commodity, x86 server type
> hardware, we're not talking about a lot of money, even if you factor in the
> additional costs for another switch port, etc.  Certainly not the kind of
> money one would expect to be kicking around for custom CPUs - which I would
> guess to be _well_ in excess of SPARC or PA-RISC prices.

Especially since StackGuard/ProPolice/GC flags are now growing
widespread (on by default in current/future M$ products, on-by-default
in OpenBSD) and teh overhead in practice is a fair bit lower than the
8% worst case hit.

EG,
http://www.research.ibm.com/trl/projects/security/ssp/node5.html#SECTION00051000000000000000

8% is the WORST CASE overhead for ProPolice: it does a character array
and a function call.  IN practice, its a 4% performance overhead for
perl, 1% for ctags, and 0% for imapd.  Amdahl's law works in both
directions: you can also slow down infrequent events and have very
little impact on performance.

> I think the project/product is quite interesting from an academic
> standpoint, but unless it can be put into mainstream production with
> existing vendors, my realistic side says it'll never be economically
> feasible to get out of academia.

Speaking as an academic and an architecture type, I'm not even sure if
its all that interesting from an academic viewpoint...

There are lots of changes/studies which can be done in the
architecture field which would offer much more improved performance

(Both of these are projects I'd like to see architecture-class
students do, as I don't have the time...)

EG, what happens for a RISC instruction set, or even better, an
IA64-like ISA, when you have an "Aligned 4/Aligned 8 register"
spill/load instruction.  I know this is reasonable from a physical
viewpoint: Wires are cheap, its interconnect (switching) thats
expensive, and with 3+ issue, there are definatly enough wires to the
RF to access 4 or 8 locations in a single cycle, but I don't know the
effect from the compiler viewpoint, or what the overall performance
result would be.

Or what happens if you make a 3 issue IA64: How much slowdown over a
full 6-issue Itanium II do you get in practice?  I think this can be
estimated by doing a binary translation to insert stop bits.  What
happens if you then interleave-multithread ($C$-slow) it, droping the
single-thread clock by another 10-20%, but now supporting two
independant threads?  This might take a simluator.

Or much more improved security/security performance:

Better hardware/software codesegin real-time garbage collection and
runtime bounds checks needed for safe languages.  There was a lot of
work done on this in the 80s, what happens if it gets revisited today,
with much cheaper silicion?

-- 
Nicholas C. Weaver                                 nweaver@cs.berkeley.edu