On 19/10/2023 16:16, Kai Song via Gcc-help wrote:
Thank you for your feedback!
I forward the requested information to everyone as it may be relevant.
(Note - I am not a GCC developer, I am just a user, whereas Jonathan is
one of the key developers of GCC. So if he ever writes something that
contradicts me, believe Jonathan, not me!)
Jonathan & David: How much RAM? Quadratic time-complexity.
I have 256GB of RAM and 64 cores. I launch g++ from the Windows 10
Terminal. From my information, Windows 10 will not limit the resources
given to the Terminal.
I don't want to sound anti-Windows, but I think it is very unusual to
see someone using Windows with something like this. My own experience,
as someone who has both Linux and Windows machines on my desk since I
have work that uses both systems, is that big builds with GCC can be at
least twice as fast on Linux than Windows with the same hardware. Linux
is simply better than Windows for this kind of thing - it has better
handling of large amounts of ram, better multi-core handling, better
file handling and better handling of multiple processes. (Windows is
better for some other things, but not this kind of thing.) What kind of
difference it would make for you here, I don't know - but with the
massive compile times you have, it is worth the effort checking.
It also seems crazy to use off-the-shelf builds of GCC here, if that is
what you are doing - under the same circumstances, I'd be looking at
building GCC from source with the best choice of flags to shave the last
few percent off the times. I'd be using "-march=native" to fine-tune
GCC to get the best from my 64 cores. (Consider a Linux distro like
Gentoo that compiles everything from source if you don't want to learn
how to build GCC yourself.)
I am perfectly fine if compile times are in the time-span of months. For
the Runge-Kutta method, I would be happy with compile times of years as
this would mean I have a faster method in a few years.
But would the resulting binary, generated after months, actually be
faster? A big source code is going to result in a big binary, which is
likely to be slower in practice due to cache misses.
Also note that there are pretty much no calculations taking years to run
that are actually worth running. If a calculation is going to take 2
years to do, you are better off selling your current computer, investing
the money in a fund, taking it out in a years time, buying a new
computer at that point which is three times the speed, and running the
calculation in 8 months - finishing sooner than if you started today
with your current machine.
Perhaps I am completely misunderstanding what you are actually trying to
achieve.
In my mind, I am comparing your task to something simpler (one that
everyone here can understand, avoiding the post-grad level mathematics).
Sometimes it is nice to have bigger integer types available than the
usual biggest size of 64 bits. It's reasonably straightforward to make
a C++ class for a 128 bit integer type with operator overloading, built
from two 64-bit integers. Then you can make a 256 bit integer type
build from two 128 bit integers. Then a template that does this
automatically, and your recursive template setup will let you write:
using uint4096_t = big_uint<14>; // 2 ** 14 bits
and you are ready to use it in your RSA encryption code. It sounds
simple - and works well for 128-bit and 256-bit integers. But while it
will compile for 4096-bit integers, and give the right answers, it is
likely to be terrible in performance. Long before reaching 4096 bits,
it will make sense to move to arbitrary-precision integer libraries,
using loops over 64-bit chunks rather than just throwing a big template
at the compiler and drinking coffee during compiles.
Your problem sounds like the same principle, only more so. But as I
say, maybe I am missing something vital in my understanding of your problem.