Hello Simon,
thank you for these tips.
I've looked at the generated C code. The "killer" seems to be
cob_move(), which is obviously necessary to move data between different
variable types and structures. In this case it's the moves "move 1 to k"
and "move 0 to k". Since source and destination have differing
structures (pic 9 comp <--> constant 0/1) cob_move() does the job.
But a simple modification of the program changes the timing dramatically:
---(tspeed3.cob)---
identification division.
program-id. tspeed3.
data division.
working-storage section.
01 i pic 9(7) comp-5 value 0.
01 j pic s9(14) comp-5 value 0.
01 k pic 9 comp-5 value 0.
01 k0 pic 9 comp-5 value 0.
01 k1 pic 9 comp-5 value 1.
PROCEDURE DIVISION.
move 0 to j, k
perform varying i from 0 by 1 until i >= 10000000
if k = k0
add i to j
move k1 to k
else
subtract i from j
move k0 to k
end-if
end-perform
display j upon console
stop run.
---
$ time ./tspeed3
-00000000000005000000
real 0m0.076s
user 0m0.070s
sys 0m0.003s
---
$ time ./tspeed2
-00000000000005000000
real 0m0.382s
user 0m0.380s
sys 0m0.000s
---
Now the compiler generates a direct assignment insted of a cob_move().
The same could go for the comparison ("if k = 0" <--> "if k = k0") as
well: In both cases a "cob_cmp_u8()" is generated but in the latter case
a direct comparison would be possible, just like the direct assignment
is possible (and way faster).
Wouldn't it be possible to optimize these cases at compile time?
-fnotrunc does this boost as well here, but it might be "too invasive"
for legacy code, though.
Thank you very much.
Best regards,
Christian
Am 30.12.20 um 15:27 schrieb Simon Sobisch:
Am 30.12.2020 um 11:48 schrieb Christian Lademann (ZLS):
Hello gnucobol team
first of all let me point out how happy I am that there is an open
source COBOL implementation that is impressing capable and compatible
and growing from release to release. I've been experimenting with
open-cobol and gnucobol for quite some time now.
Thanks. We're happy to be able to provide GnuCOBOL :-)
While writing some test programs for comparing computational speed, I
got the impression that there might be some bottlenecks.
Of course there are differences as the rules for arithmetic differ
between C and COBOL.
For example:
---(tspeed2.cob)---
identification division.
program-id. tspeed2.
data division.
working-storage section.
01 i pic 9(7) comp-5 value 0.
01 j pic s9(14) comp-5 value 0.
01 k pic 9 comp-5 value 0.
procedure division.
move 0 to j, k
perform varying i from 0 by 1 until i >= 10000000
if k = 0
add i to j
move 1 to k
else
subtract i from j
move 0 to k
end-if
end-perform
display j upon console
stop run.
---
$ cobc -x -O2 tspeed2.cob
$ time ./tspeed2
-000000005000000
real 0m0.373s
user 0m0.367s
sys 0m0.003s
---(c-tspeed2.c)---
include <stdio.h>
main() {
long i, j;
int k;
j = 0;
k = 0;
for(i = 0; i < 10000000; i++) {
if(k)
j -= i;
else
j += i;
k = !k;
}
printf("j=%ld\n", j);
}
---
$ make c-tspeed2
$ time ./c-tspeed2
j=-5000000
real 0m0.027s
user 0m0.027s
sys 0m0.000s
The C variant of this benchmark is more than 10 times faster!
I know that this just a very limited-scoped look at performance and
maybe the comparison is unfair.
That's actually a reasonable test, the only "unfair" part is that you
have a very similar test but COBOL "works different" so you're testing
different things ;-)
But I might be missing something.
Are there other performance tweaks, besides "-O2"?
To get the fastest COBOL: ensure that GnuCOBOL itself is heavy optimized:
* install most current C compiler available
* build GMP (or MPIR) from source, this normally will produce a library
that is bound to at least the current CPU features (after changing the
CPU you'd want to redo that part)
* get most current GnuCOBOL version, build it from source with
CFLAGS="-O2 -march=native" and pointing to the installed libgmp/libmpir
That will normally provide you with a better overall speed, in most
cases more than cobc -O2 will provide.
cobc -O2 something.cob always lead to longer compilation times, but the
generated modules are often not much faster Than with -O; -Os often does
make a difference in the size of the module; -O0 is sometimes necessary
if you compile big sources and cobc has to use an outdated compiler (or
new MSVC...).
The biggest difference you'll normally see for heavy arithmetic is when
you explicit disable ANSI truncation rules (also true for other COBOL
compilers).
cobc -x -fnotrunc tspeed2.cob
will give you a much faster result (around 2 times of the C program and
actually _does_ nearly the same thing as your C program).
Note: in any case COBOL modules always have the additional workload of
initializing the GnuCOBOL runtime - which on this machine here takes
nearly the same time as the complete C program... - so the _actual_
workload of the COBOL module with -fnotrunc is nearly identical to the C
program :-)
To calculate the time needed for initializing the runtime: compile and
profile an "empty" COBOL program.
As soon as a COBOL program does "common" io that additional startup time
does not really count, but for using COBOL as scripting language (which
is possible with cobc -xj) it would be reasonable to provide a switch to
prevent most of the runtime initialization (at least the parts that
check the process environment).
Thank you very much.
Best regards,
Christian.
You're welcome,
Simon
.
--
* Christian Lademann, ZLS Software GmbH mailto:lademann@xxxxxx
*
* ZLS Software GmbH
* Frankfurter Straße 59 Telefon +49-6195-9902-0 mailto:zls@xxxxxx
* D-65779 Kelkheim Telefax +49-6195-900600 http://www.zls.de
*
* Geschäftsführung John A. Shuter
* Handelsregister Amtsgericht Königstein, HRB 3105