On Tue, Feb 03, 2004 at 07:36:55AM -0500, Dave Paris composed: > I'm not sure I understand the economics involved here. Taking the > worst-case (software) cited at an 8.3% performance hit, this says a 3.2GHz > P4 will give approximately the same performance as a 2.9GHz machine. Or put > another way, for every 12 machines I have operating on a problem (say, in a > cluster of some sort), I have to add in one additional machine to make up > for the performance hit. If we're talking about commodity, x86 server type > hardware, we're not talking about a lot of money, even if you factor in the > additional costs for another switch port, etc. Certainly not the kind of > money one would expect to be kicking around for custom CPUs - which I would > guess to be _well_ in excess of SPARC or PA-RISC prices. Especially since StackGuard/ProPolice/GC flags are now growing widespread (on by default in current/future M$ products, on-by-default in OpenBSD) and teh overhead in practice is a fair bit lower than the 8% worst case hit. EG, http://www.research.ibm.com/trl/projects/security/ssp/node5.html#SECTION00051000000000000000 8% is the WORST CASE overhead for ProPolice: it does a character array and a function call. IN practice, its a 4% performance overhead for perl, 1% for ctags, and 0% for imapd. Amdahl's law works in both directions: you can also slow down infrequent events and have very little impact on performance. > I think the project/product is quite interesting from an academic > standpoint, but unless it can be put into mainstream production with > existing vendors, my realistic side says it'll never be economically > feasible to get out of academia. Speaking as an academic and an architecture type, I'm not even sure if its all that interesting from an academic viewpoint... There are lots of changes/studies which can be done in the architecture field which would offer much more improved performance (Both of these are projects I'd like to see architecture-class students do, as I don't have the time...) EG, what happens for a RISC instruction set, or even better, an IA64-like ISA, when you have an "Aligned 4/Aligned 8 register" spill/load instruction. I know this is reasonable from a physical viewpoint: Wires are cheap, its interconnect (switching) thats expensive, and with 3+ issue, there are definatly enough wires to the RF to access 4 or 8 locations in a single cycle, but I don't know the effect from the compiler viewpoint, or what the overall performance result would be. Or what happens if you make a 3 issue IA64: How much slowdown over a full 6-issue Itanium II do you get in practice? I think this can be estimated by doing a binary translation to insert stop bits. What happens if you then interleave-multithread ($C$-slow) it, droping the single-thread clock by another 10-20%, but now supporting two independant threads? This might take a simluator. Or much more improved security/security performance: Better hardware/software codesegin real-time garbage collection and runtime bounds checks needed for safe languages. There was a lot of work done on this in the 80s, what happens if it gets revisited today, with much cheaper silicion? -- Nicholas C. Weaver nweaver@cs.berkeley.edu