Re: Compilation of lengthy C++ Files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 19 Oct 2023 at 15:17, Kai Song via Gcc-help
<gcc-help@xxxxxxxxxxx> wrote:
>
> Thank you for your feedback!
>
> I forward the requested information to everyone as it may be relevant.
>
> >> Jonathan & David: How much RAM? Quadratic time-complexity.
> I have 256GB of RAM and 64 cores. I launch g++ from the Windows 10
> Terminal. From my information, Windows 10 will not limit the resources
> given to the Terminal.
> I am perfectly fine if compile times are in the time-span of months. For
> the Runge-Kutta method, I would be happy with compile times of years as
> this would mean I have a faster method in a few years.
>
> >> Probably everyone: Is your code reasonable?
> When trying to code-gen into shorter code, there are a few considerations:
> 1) Will any generated application be limitable into a certain volume of
> source-code? Certainly not, so why even argue on one particular instance of
> volume?

I have no idea what this means.

> 2) May it necessitate significant amounts of work to enable a
> code-generator to produce shorter code? Certainly.

Well at the moment your code doesn't even compile. So you can keep
generating billion line functions that don't work, or you could try
refactoring it. It might not be easy, but if it compiles and works,
surely that's better than billion line functions that don't work?


> 3) Suppose a code-generator exploits structure of a given user-provided
> problem instance for generation of shorter code. Will this mean the set of
> feasible problems treatable by the code-generator shinks? Likely.

I don't see why that should be true.

> 4) Is 99% of the code trivial and should be compilable in read-time? I
> believe so, given my naive information.

A billion lines of trivial operations still consumes ridiculous
amounts of resources to compile. Your naive view doesn't seem
relevant.

> 5) Should months of work be invested into generating shorter code in order
> for that to be able to compile in 5 minutes when really this code is
> compiled and used only once in a lifetime? I am unsure about this.
>
> The stinging issue of argument may be on aspect 4), which makes it seem as
> though the other 99% should be wrappable into functions.
> The, if we call it so, mis-expectation is that then clean interfaces to
> these functions are generatable or may even not exist. To give an example,
> suppose a skein of code-strings in which each invoked 1% non-trivial code
> is changing one string so that it induces chaos into the naming of all
> other temporary variable names, causing mismatch between contiguity between
> data at code-gen time and generated-code time. That is the central issue
> that I --admittedly-- brute-force by committing into lengthy code.

I don't understand this either, sorry.

Nothing you have said actually explains why you can't refactor the
code into utility functions that avoid doing everything in one huge
function.

>
> >> Jonathan: Have you tried using clang instead?
>
> I tried ICC (won't install successfully on my Windows 10 PC) and then CLANG
> 17.0.0 .
>
> CLANG requires a provided STL, so I used the ones from Microsoft Visual
> Studio 2022, which I referenced via the -isystem flag.
> In order to compile it, I had to make tons of changes that g++ would not
> have complained about; like putting a "d" at the end of a floating-point
> value.

Eh?!

You must be doing something wrong. Clang does not require that.

> To test all these changes, I chose a smaller codegen parameter for which
> g++ is able to compile the code and yields the expected result.
> On that smaller code, clang instead fails with the following error:
>
> // source code from line 201: therein, k is a template parameter, nx is a
> static constexpr std::array<const size_t>, and mx is a static constexpr
> size_t .
> random_generator.template setRandn<(size_t)(mx-nx[k])>(  x+nx[k]);
> random_generator.template setRandn<(size_t)(mx-nx[k])>(  xp+nx[k]);
> random_generator.template setRandn<(size_t)(mb-nb[k])>( Ax+nb[k]);
> random_generator.template setRandn<(size_t)(mb-nb[k])>( Axp+nb[k]);
> random_generator.template setRandn<(size_t)(mx-nx[k])>( dx+nx[k]);
> random_generator.template setRandn<(size_t)(mb-nb[k])>(dAx+nb[k]);
> for(size_t i=nc[k];i<mc;++i){ calc.template setRandn<numPages>(gc[i]);
> calc.template setRandn<numPages>(gc0[i]); }
> random_generator.template setRandn<(size_t)(mx-nx[k])>( x0+nx[k]);
>
> // clang compiler error message
> error: no matching member function for call to 'setRandn'
> 206 | random_generator.template setRandn<(size_t)(mx-nx[k])>( x0+nx[k]);
>
> The error appears unsensical to me, particularly since g++ was able to
> compile it, and since clang accepts the same expression five lines earlier
> but not in the present line.
> The code snippet is from a test class that adds noise to vector-components
> that should be uninvolved in certain test computations.
>
> If it helps, I can work on making it compilable to clang in an effort to
> identify up to which size either compiler is capable of compiling it.
> Please let me know what helps.
>
> Kind regards,
> Kai
>
> On Thu, Oct 19, 2023 at 2:47 PM David Brown <david@xxxxxxxxxxxxxxx> wrote:
>
> > On 18/10/2023 18:04, Kai Song via Gcc-help wrote:
> > > Dear GCC Developers,
> > >
> > > I am unsuccessfully using g++ 12.0.4 to compile lengthy c++ codes. Those
> > > codes are automatically generated from my own code-generator tools that
> > > depend on parameters p.
> > > Typical applications are:
> > > - Taylor series of order p inserted into consistency conditions of
> > > numerical schemes, to determine optimal method parameters (think of,
> > e.g.,
> > > Runge-Kutta methods)
> > > - recursive automatic code transformation (think of adjoints of adjoints
> > of
> > > adjoints...) of recursion level p
> > > - Hilbert curves or other space-filling curves to generate code that
> > > simulates cache utilization in a Monte-Carlo context
> > >
> > > I verify that for small p the codes compile and execute to the expected
> > > result. However, there is always a threshold for p so that the generated
> > > cpp file is so long that the compiler will just terminate after ~10min
> > > without monitor output but return the value +1.
> > > My cpp files range from 600k LOC up to 1Bio LOC. Often, the file
> > comprises
> > > of one single c++ template class member function definition that relies
> > on
> > > a few thousand lines of template-classes.
> > >
> > > I would like to know:
> > > 1) Am I doing something wrong in that GCC should be able to compile
> > lengthy
> > > codes?
> > > 2) Is it known that GCC is unable to compile lengthy codes?
> > > 3) Is it acknowledged that a compiler's ability to compile large files is
> > > relevant?
> > > 4) Are possible roots known for this inability and are these deliberate?
> > >
> >
> > I am curious to know why you are generating code like this.  I can see
> > how some code generators for mathematical code could easily produce
> > large amounts of code, but it is rarely ideal for real-world uses.  Such
> > flattened code can reduce overheads and improve optimisation
> > opportunities (like inlining, constant folding, function cloning, etc.)
> > for small cases, but then they get impractical for compiling while the
> > costs for cache misses outweigh the overhead cost for the loops or
> > recursion needed for general solutions.
> >
> > Any compiler is going to be targeted and tuned towards "normal" or
> > "typical" code.  That means primarily hand-written code, or smaller
> > generated code.  I know that some systems generate very large functions
> > or large files, but those are primarily C code, and the code is often
> > very simple and "flat".  (Applications here include compilers that use C
> > as a intermediary target language, and simulators of various kinds.)  It
> > typically makes sense to disable certain optimisation passes here, and a
> > number of passes scale badly (quadratic or perhaps worse) with function
> > size.
> >
> > However, if you are generating huge templates in C++, you are going a
> > big step beyond that - templates are, in a sense, code generators
> > themselves that run at compile time as an interpreted meta-language.  I
> > don't expect that there has been a deliberate decision to limit GCC's
> > handling of larger files, but I can't imagine that huge templates are a
> > major focus for the compiler development.  And I would expect enormous
> > memory use and compile times even when it does work.
> >
> >
> >




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux