Greetings GCC users and developers!
I've recently embarked upon a possibly futile effort to create a script
to bootstrap a GNU toolchain - binutils, gcc, and glibc - from a system
with the most minimal of prerequisites. My goal is to have a script
that can, on any build host, create a toolchain that can run on a
specific host type (which could be the same as the build system type or
could be different), and targeting a possibly different target type. In
other words, I'm trying to build a single script that can build a
toolchain for any arbitrary combination of build, host, and target
system type.
I've gotten pretty far, although it's taken quite a long time to
understand the intricate dance that must be performed to bootstrap gcc
and glibc, and also has required some patches to glibc, mostly to get
past what I consider to be a deficiency in its autoconf scripts: namely,
that they error out on tests that check for a working linker when in
fact glibc ought to be buildable without any liker at all (although its
utility programs can't be built, but those aren't necessary during the
bootstrapping process). At this point I can produce a working binutils
and glibc, but the "final" build of gcc is giving me some problems that
I am still working through.
But what I really want to talk about is my general approach and to get a
validation of it and my assumptions about the value of (or lack thereof)
my approach.
Basically, I want my build script to not assume that the system compiler
is anything other than an ISO C90 compiler, with a standard C library
that may not have anything to do with glibc but is a complete
implementation of the C library standard.
What this forces me to do is to not simply compile all the tools with
the system compiler directly; the only tools I can build with the system
compiler are binutils and gcc. glibc itself has a strict requirement
that it be compiled with gcc and I don't even want to assume that any
old version of gcc on the system is sufficient; I'd rather let the
version of gcc being bootstrapped be the one to compile glibc. In this
way it's up to the toolchain builder to choose versions of binutils,
gcc, and glibc that are known to work together, not to require that his
or her build system has a binutils, gcc, or glibc that is compatible
with whatever target versions are being built. Like I said, I just want
to assume an ISO C90 compiler and C library, and nothing more.
The set of steps that I have come up with to accomplish this
bootstrapping is:
1. Build binutils
2. Build stage1 gcc, building just the "gcc" and "install-gcc" targets,
not the full build (which would try to compile libraries that require
glibc, which has not yet been built)
3. Build stage1 glibc using the stage1 gcc compiler; this uses the
binutils from (1) and the stage1 gcc from (2). This version of glibc is
built with only static libraries and without any of the helper programs
of glibc, because the stage1 gcc cannot build shared libraries or
executables.
4. Build stage2 gcc against the stage1 glibc, with executable and shared
library support, but without libmudflap which cannot be built against
the purely static stage1 glibc.
5. Build final glibc with stage2 gcc, this is a complete and final glibc
with shared library support and support of all features.
6. Build final gcc against final glibc, which is a complete gcc with
full support for all features.
(my remaining difficulty is with step 6, the problem being that the
stage2 gcc uses a sysroot that is causing it to fail to be able to link
against final glibc properly, but I'll work that out)
These steps are complicated by gcc's library dependencies (zlib, gmp,
mpfr, mpc) that must be built for both the build system, host system,
and target system at various points during the process, and also by
multiple versions of binutils needing to be built because of binutils
"feature" of requiring sysroot to be a compile-time option instead of a
runtime option.
What the above sequence produces is a cross-compiler built to run on the
build system targeting a given target system, which is not the end goal
of the bootstrapping process, but does produce cross-compilers that are
needed to complete the process.
That sequence is run twice: once to produce a cross-compiler that runs
on the build system and targets the host system, and once to produce a
cross-compiler that runs on the build system and targets the target
system (if host=target, then only one build is necessary).
Finally, once a cross-compiler for both the host and target system is
available, a final binutils version to run on the host system and target
the target system is built, along with a gcc for the host system
targeting the target system.
These steps result in quite a few compiles:
- binutils is built 9 times
- gcc is built 6 times
- glibc is built 5 times
But I believe that this process is successful in not depending on the
version of the build system compiler at all; it simply needs to be ISO
90 compliant so that it can build gcc and binutils (like I mentioned,
the bootstrapped gcc and binutils are themselves used to create glibc).
At each step of the bootstrapping process, each tool is only dependent
on the other tools being built, except of course for the build system
ISO C90 compiler and C library.
One shortcoming of my approach is that the final version of glibc is not
built by a gcc that was built by itself; it is instead built by a gcc
that was built by the build system compiler. Does this matter? If so I
think the easiest thing for me to do would be to adapt my script to
first build binutils, gcc, and glibc with build=host=target, and then
use that as the "build system toolchain" for the other steps I outlined
above. Then the versions of the compiler and binutils that will be used
to produce the final versions of glibc will have been built by the
target toolchain itself instead of by the system toolchain. This will
add 4 more binutils builds, 3 more gcc builds, and 2 more glibc builds
to the mix, but it will hopefully produce even more robust output.
I think that some of my steps could be simplified if I could convince
myself that I don't need to use sysroots during various stages of the
bootstrapping, and can just reference the build system includes and
libraries instead of trying to always be sure that every step references
only the toolchain being built. Is it a worthwhile goal to try to make
every build step rely only on the toolchain being built instead of the
build system toolchain?
Finally, can someone validate my assumptions here:
1. When gcc is built, it should be built with a --with-build-sysroot
that references the version of glibc being built rather than the build
system's libc.
2. When glibc is built, it is OK for it to reference the build system's
libc header files rather than its own. I haven't figured out how to
configure glibc's build to reference only its own headers instead of the
system libc headers (I try to avoid CFLAGS because it wreaks havoc with
configure).
3. --with-build-sysroot is a sufficient option to cause gcc builds to
only reference glibc headers and libs produced during the bootstrapping
process
Sorry for the disjoint and wordy nature of this post; I'm really tired
after many long hours of hacking on this script.
Thanks!
Bryan