Re: Compilation of lengthy C++ Files

Paul Smith via Gcc-help <gcc-help@xxxxxxxxxxx> · Fri, 20 Oct 2023 18:03:15 -0400

On Fri, 2023-10-20 at 23:08 +0200, Kai Song via Gcc-help wrote:
> Yes. I have invested time on my end to cause some understanding for
> an issue.

Maybe I can provide context.  Note I'm not a GCC developer although I
have worked on compiler implementations in the past.

The C++ standard defines an abstract definition.  Individual
implementations of compilers for the C++ standard will have limitations
on the abstract definition, obviously: computers are not abstract and
so they are limited.

When a compiler compiles a single function, _especially_ if you enable
optimization, it's not the case that it can simply churn through that
function line by line, process that line and spit out some assembly,
then move to the next line without any context.  Instead, within a
single function the compiler must remember all sorts of things about
the lines in the function that it already parsed.

The longer your function is, the more things the compiler has to
remember about all the previous content that it already parsed.

When programmers write code they naturally break it up into functions
that can managed and understood by other programmers, so compilers
expect that individual functions are of "mangeable" size and aren't
hundreds of thousands of lines long, and implementations make choices
that reflect that expectation.

As a result there will be some limitations on the size of a function
that a compiler can handle with any sort of grace: beyond that the
implementation may well run into asymptotic runtimes due to choices in
internal data structure and algorithms.

Is this unfortunate?  Of course, abstractly it's always unfortunate
when a compiler limits the program you want to write.  But, it's quite
reasonable for the compiler authors to decide to concentrate on the
99.999% of code they expect to see, and not worrying about the 0.001%
of code they don't expect, and of which they probably don't even have
examples to test.

So, you can either decide you want to try to help fix the compiler to
be able to better handle code like yours, by doing performance analysis
and figuring out where the bottlenecks are and providing patches that
help the compiler in these extreme edge cases (without damaging it in
the "normal" cases).

Or, you can change your code generator to output code which is more
like what the compiler expects, where you have many more smaller
functions, potentially even in separate source files, rather than one
gigantic function containing all the code.

Or, I guess, you can look for a different compiler that has different
goals.

So to answer your questions:

> 1) Am I doing something wrong in that GCC should be able to compile
> lengthy codes?

I'm not sure exactly what this question is asking.  If what you mean
is, is there some flag or option you can give to GCC to get it to build
your code then no, I doubt it.  Disabling optimization altogether will
help, but may not be enough; I assume you already tried that.

If you are asking should GCC be able to compile these huge functions,
then I guess the answer is that the GCC authors don't think that this
is an important use-case so they haven't tried to make it work well.

> 2) Is it known that GCC is unable to compile lengthy codes?

I doubt that it's known per se because I'm quite sure that there are no
tests in the compiler test suite that try to compile functions at this
scale.

But if you tell us you do see that it's unable to be done, I doubt that
anyone on this list is particularly surprised by that.

> 3) Is it acknowledged that a compiler's ability to compile large
> files is relevant?

As above, I doubt the GCC developers feel that it's particularly
relevant, no.  Which is not the same thing as saying that, if someone
points out a specific problem and provides a reasonable fix, that they
wouldn't include it.

> 4) Are possible roots known for this inability and are these
> deliberate?

I would say, as above, that the GCC developers do not have such huge
functions in their test suite and so no, they are not aware of exactly
what problems would occur when trying to compile them.  If they were to
sit and think about it for a while, I'm sure various developers could
imagine possible bottlenecks that would cause huge functions to compile
slowly, but as always with performance the only way to really know is
to measure it.

I don't know what you mean by "are they deliberate".  I'm sure they
didn't sit down to write the compiler implementation and say "let's
make it slow for huge functions!"

But it could well be that when making design choices if there was a way
to do it that was much easier to implement or allowed for faster
compilation for medium-sized functions but would be hugely inefficient
for gigantic functions, they may have said "well, these huge functions
are extremely unlikely so we won't optimize for that".

There's no way to know until, as above, someone sits down with such a
huge file and performs performance analysis on the compiler as it tries
to compile the function, and figures out what is going on.