Introduction It has become increasingly clear that PowerPC processor family needs additional (machine) targets for the Linux distributions. At present, Linux only has two targets (powerpc32 and powerpc64, powerpc is synonym for powerpc32) for PowerPC. These targets only address the two operating modes (32- and 64-bit) and don't address the wide range of processor families and chips available. With only one target per mode, we are forced to compile for a common subset of powerpc instructions and default instruction scheduling. As we have PowerPC processors ranging from embedded systems to large servers this means we are sacrificing performance for commonality even when it is not strictly required. The PowerPC architecture has been around a long time, and consequently the common subset (for tuning and instructions) has become less and less relevant for systems that have actually shipped in the last few years. Moreover, a common subset prevents exploitation of microarchitectural differences between Power4, Power5, and ppc970. Addition of a new processor called Power6 (http://www-128.ibm.com/developerworks/power/newto.html) may engender even more microarchitectural differences. The goal of this proposal is to: * Improve application performance on current distributions * Allow applications running on a machine to exploit the CPU-specific tuned libraries available on that machine * Provide a general framework for CPU-specific tuning for the PowerPC architecture Approach The approach we are proposing is to: * Allow multiple processor specific (performance tuned) assembler implementations of core memory and string functions (memcpy, memset, memcmp, ...). * Allow multiple processor specific (performance tuned) implementations of Math library (libm) functions. * Allow the compiler (and assembler implementations) to use new instructions beyond the common powerpc subset (-mcpu=). * Allow the compiler to tune (schedule instructions) for specific processor families (-mtune=). * Allow the tuning of various glibc functions based on the processor family. For example the malloc DEFAULT_MMAP_THRESHOLD should be higher on a POWER4/5 server. * Allow distros to build multiple (processor tuned) versions of the glibc libraries and install the correct version on the target system. The intent is to be similar to IA32 with the i386/i486/i586/i686/i786 and sparc with the sparc/sparcv8/sparcv9/sparcv9b machine targets. The added twist for powerpc is biarch support for the powerpc32 and powerpc64 ABIs. So each 64-bit machine target needs a suffix to distinguish the 32- and 64-bit ABIs. Sparc is similar with the sparc64/sparc64b machine targets but the issue is more pervasive in PowerPC because all POWER3/4/5 (and 970) machines are 64-bit implementations that support both ABIs, but may require different tunning. So I am proposing to add new "machine" targets to the powerpc family. The target names will follow the POWER3, POWER4, POWER5, ... naming of the current IBM Server brands and add a _32/_64 suffix to support biarch systems. Retain (compatible with all existing Linux on Power systems) powerpc (a synonym for powerpc32) powerpc32 powerpc64 And add power4_32 power4_64 power5_32 power5_64 ppc970_32 ppc970_64 Or alternatively powerpc32_power4 powerpc64_power4 powerpc32_power5 powerpc64_power5 powerpc32_970 powerpc64_970 I see no need to support a separate (from existing powerpc32/64) POWER3 and RS64IV targets at this time. The POWER3 systems are quite old and the RS64IV systems are "strongly storage consistent" machines. The POWER4, POWER5, and PPC970 processors allow "weak storage consistency" and are more aggressively piped for out-of-order instruction execution. This is difference requires very different instruction scheduling for optimal performance. Glibc and other package changes The changes needed to enable additional targets for glibc include: * Add the new machine targets to ./scripts/config.sub (and in autoconf) * Update the base_machine and machine mapping for the new targets in ./configure.in * Add the new target patterns to ./shlib-versions and ./ntpl/shlib-versions * Provide additional ./scripts/data/c++-types-power*-linux-gnu.data files to match the new machine targets. * Update the ./abilist/* files to cover the new machine targets. The various targets need to be represented in the CVS directory structure of glibc. Each of the new targets we are proposing support both 32- and 64-bit mode compatible with the current powerpc32 or powerpc64 targets. So the current directory structure will be extended above the current powerpc[32|64] directories. For example: directory ./sysdeps/powerpc/powerpc32 contains 32-bit implementations common to powerpc, while ./sysdeps/powerpc/powerpc32/power4 contains 32-bit implementations that can use instructions or optimizations available on POWER4 processors. Similarly for 64-bit; ./sysdeps/powerpc/powerpc64/power4. And finally the directory ./sysdeps/powerpc/powerpc32/powerpc64 could contain 32-bit code that uses instructions only available on 64-bit powerpc implementations. The config.guess script is a bit problematic but not strictly required to support this proposal. Config.guess depends on "uname --machine" to guess the machine target. However the powerpc64 kernel currently reports "ppc64" for all models. So without changes to the kernel to report different machine strings or enhance the uname command to report useful "-processor" data, updating config.guess is mote. This is not critical as a biarch glibc build should not depend on config.quess anyway and other projects will be safe with the default powerpc/powerpc64 targets. Finally we need to provide more information in the Aux Vector AT_HWCAP. The AT_HWCAP is used by rpm to select libraries to match the processor at install (at least for i[34567]86 Linux systems). We will need to add AT_HWCAP flags to allow rpm to do the same for powerpc. Detail discussion Note: I am ignoring the little-endian variants powerpcle/powerpc64le because I don't know of any one building those for Linux. Note: I am not ignoring the Apple G5 in this discussion. The IBM970 chip core is derived from the POWER4+ core, so any tuning (-mtune=power4) for POWER4 benefits the G5 for both 32- and 64-bit applications. But this tuning would not benefit 32-bit applications running on a G3's or G4's. The processors (G3 vs G4 vs G5) are from different manufacturers and have very different internal structures (micro-architectures). The ppc970 processor raises an interesting question. If the ppc970 resembles the POWER4+, do we need separate (from power4) target for ppc970? The ppc970 is a 64-bit implementation based on the POWER4+, with the addition of the Altivec vector SIMD instructions (two additional execution pipelines). Our analysis is that glibc (libc, libm, libpthread, ...) would not benefit from direct exploitation of the Altivec instruction set. So a power4 target would be enough for glibc. While our current proposal is focused on glibc, other libraries/projects (gd, jpeg, libtiff, mad, ...) might benefit from using Altivec. This will become more attractive in the gcc-4.1 timeframe where autovectorization will be fully functional. So we should add ppc970 targets for completeness. In the PowerPC Architecture there are several FPU instructions that are listed as "optional" but implemented on all current 64-bit hardware. There are also instructions that are defined only for 64-bit hardware and usable in 32-bit mode. Optional Instructions: Store Floating-Point as Integer Word Indexed (stfiwx) Floating Square Root (fsqrt) Floating Square Root Single (fsqrts) Floating Reciprocal Estimate Single (fres) Floating Reciprocal Square Root Estimate (frsqrte) Floating Select (fsel) 64-bit hardware only instructions, usable in 32-bit mode: Floating Convert To Integer Doubleword (fctid) Floating Convert To Integer Doubleword with round toward Zero (fctidz) Floating Convert From Integer Doubleword (fcfid) Instructions added for POWER5: Bytewise popcount (popcntbd) Floating Reciprocal Estimate Double (fre) Data Cache Block Flush Local (dcbfl) With the current generic powerpc targets these instruction are not generated by gcc. We also need to identify the processor type from the AT_HWCAP aux vector. For example we could use the following: HWCAP bits Processor type PPC_FEATURE_POWER4 power4 PPC_FEATURE_POWER5 power5 PPC_FEATURE_POWER4 + PPC_FEATURE_HAS_ALTIVEC ppc970 Note: PPC_FEATURE_64 is an existing bit that is set for all 64-bit powerpc kernels. PPC_FEATURE_HAS_ALTIVEC is an existing bit that is set for the 970 processors. One problem remains. For glibc at least the i[34567]86 and sparc[v8,v9,v9b] targets allow for customized assembler implements for each variant, but this does not result in adjustments on the gcc -mcpu, -mtune options. The directory structure above does allow the opportunity to add Makefile fragments in architecture specific directories. For example; add a Makefile fragment to ./sysdeps/powerpc/powerpc64/power5 with the line: +cflags += -mcpu=power4 -mtune=power5 These +cflags options are applied to all *.c complies in the builds but not *.S compiles. This would allow gcc to use all instructions available in the PowerPC architecture and instruction scheduling appropriate for the POWER5 processor. The resulting code would still be portable to POWER4 and PPC970 systems. Another example, add a Makefile fragment to ./sysdeps/powerpc/powerpc64/ppc_970 with the line: +cflags += -mcpu=970 -maltivec -mabi=altivec This would enable the full PowerPC instruction architecture plus VMX/Altivec. GCC would be allowed to use VMX instructions even in code that did not explicitly use altivec.h types via autovectorization. The resulting libraries would not be portable to POWER4/5 systems but would be optimized for the 970 (IBM JS20 and Apple G5). Steven J. Munroe Linux on Power Toolchain Architect IBM Corporation, Linux Technology Center _______________________________________________ Autoconf mailing list Autoconf@xxxxxxx http://lists.gnu.org/mailman/listinfo/autoconf