Dear everyone, Now CLANG does compile the lengthy code of 600k LOC into a functioning object that can be linked and yields the expected result. Clang was just reluctant to make a certain pointer type-cast by itself. Given that CLANG compiles the object, what can be inferred? Further questions: - Are there any pragmas that can help the compiler identify code portions with linear compile-complexity? - What compiler flag combination should I try to increase chances at successful compilation of lengthy files? @Jonathan, I am happy to clarify: > > > > 1) Will any generated application be limitable into a certain volume of > > source-code? Certainly not, so why even argue on one particular instance > of > > volume? > > I have no idea what this means. > Is every program on earth writable into X lines of code? Then, which value is X? If X exists and is known, one can discuss whether source code of length Y should exist. > > 2) May it necessitate significant amounts of work to enable a > > code-generator to produce shorter code? Certainly. > > Well at the moment your code doesn't even compile. So you can keep > generating billion line functions that don't work, or you could try > refactoring it. It might not be easy, but if it compiles and works, > surely that's better than billion line functions that don't work? > But certainly, executing a code of length N is an easier problem (linear in N) than figuring out whether a code of length N can be reduced into length M (exponentially hard in N). Also, being able to execute a code of length N is by far more useful than knowing for one niche class of problems by how much N can be reduced, not even mentioning whether M is smaller enough.. > > 3) Suppose a code-generator exploits structure of a given user-provided > > problem instance for generation of shorter code. Will this mean the set > of > > feasible problems treatable by the code-generator shinks? Likely. > > I don't see why that should be true. > In order to seek a structure that the code-generator would track from the user-provided instance, that information-processor would have to be implemented, verified, executed, and function reliably. Whenever the structure does not apply, the benefit cannot be stroked. Then still the program length won't reduce and all effort was in vain. In the context of information theory, the general hyperbola applies that then attempting to solve problems more "smartly", this results in a solver that is more efficient but less robust. Efficacy, robustness, and genericity do always form a Pareto curve. Think of higher-order numerical methods, which always add overhead while only paying off when instances are sufficiently smooth. Or think of advanced alpha-beta-pruning-schemes that only work beneficially for highly structured niche problems, such as chess. > > > 4) Is 99% of the code trivial and should be compilable in read-time? I > > believe so, given my naive information. > > A billion lines of trivial operations still consumes ridiculous > amounts of resources to compile. Your naive view doesn't seem > relevant. > I am a prisoner of myself, so I must respect that you say I cannot know. I do know that I would know how to compute my particular code in linear time of length with pencil and paper -- just my human machine-speed is 9 orders of magnitude too slow. > > 5) Should months of work be invested into generating shorter code in > order > > for that to be able to compile in 5 minutes when really this code is > > compiled and used only once in a lifetime? I am unsure about this. > > > > The stinging issue of argument may be on aspect 4), which makes it seem > as > > though the other 99% should be wrappable into functions. > > The, if we call it so, mis-expectation is that then clean interfaces to > > these functions are generatable or may even not exist. To give an > example, > > suppose a skein of code-strings in which each invoked 1% non-trivial code > > is changing one string so that it induces chaos into the naming of all > > other temporary variable names, causing mismatch between contiguity > between > > data at code-gen time and generated-code time. That is the central issue > > that I --admittedly-- brute-force by committing into lengthy code. > > I don't understand this either, sorry. > Argument 5 was on the proportionality between work time and compile time. I should rather spend 5 minutes into coding and 5 months into compiling than 1 week into coding and 5min into compiling, unless the code must be compiled and run multiple times. The elaboration on four gives a plastical description of code pattern that results in chaos, supporting the hypothesis that the problem of reducing a program of length N is in O(exp(N)) whereas running it is in O(N). > Nothing you have said actually explains why you can't refactor the > code into utility functions that avoid doing everything in one huge > function. > Because there are no two pieces of code that are exactly identical. The relative distance of two variables being involved into an identical formula will change with every line. Example: Line 1000: tmp[100]=foo(tmp[101],tmp[102]); Line 2000: tmp[200]=foo(tmp[201],tmp[203]); // dang it, not tmp[202] but tmp[203] It is like with penrose tiling. It seems all identical but the details break it. You just do not find two identical local areas. No-where. And if, you have to particularly search for them by brute-force, and that should become useless whenever the particular pattern you try to find does just not exist in that particular user's instance. > > > > > >> Jonathan: Have you tried using clang instead? > > > > I tried ICC (won't install successfully on my Windows 10 PC) and then > CLANG > > 17.0.0 . > > > > CLANG requires a provided STL, so I used the ones from Microsoft Visual > > Studio 2022, which I referenced via the -isystem flag. > > In order to compile it, I had to make tons of changes that g++ would not > > have complained about; like putting a "d" at the end of a floating-point > > value. > > Eh?! > > You must be doing something wrong. Clang does not require that. > As I now found, ICC might now work because my CPU is AMD. I was told LLVM does not come with its own STL. I understand you say it does. While g++ accepts Tfloat a=0.1d; the same appeared untrue for CLANG. Kind regards, Kai