If you define all of your variables as BINARY-C-LONG then the generated code is closer to your C code.
l_2:;
/* Line: 21 : MOVE : tstspd.bat */
(*(cob_s64_ptr)(b_9)) = 0;
(*(cob_s64_ptr)(b_10)) = 0;
/* Line: 22 : PERFORM : tstspd.bat */
(*(cob_s64_ptr)(b_8)) = 0;
for (;;)
{
if ( ((*(cob_s64_ptr)(b_8)) >= 10000000) )
break;
/* Line: 23 : IF : tstspd.bat */
if (((int)cob_cmp_s64 (b_10, (*(cob_s64_ptr)(b_11))) == 0))
{
/* Line: 24 : ADD : tstspd.bat */
cob_add (&f_9, &f_8, 0);
/* Line: 25 : MOVE : tstspd.bat */
memcpy (b_10, b_12, 8);
}
else
{
/* ELSE */
/* Line: 27 : SUBTRACT : tstspd.bat */
cob_sub (&f_9, &f_8, 0);
/* Line: 28 : MOVE : tstspd.bat */
memcpy (b_10, b_11, 8);
}
(*(cob_s64_ptr)(b_8)) = ((*(cob_s64_ptr)(b_8)) + 1);
}
/* Line: 21 : MOVE : tstspd.bat */
(*(cob_s64_ptr)(b_9)) = 0;
(*(cob_s64_ptr)(b_10)) = 0;
/* Line: 22 : PERFORM : tstspd.bat */
(*(cob_s64_ptr)(b_8)) = 0;
for (;;)
{
if ( ((*(cob_s64_ptr)(b_8)) >= 10000000) )
break;
/* Line: 23 : IF : tstspd.bat */
if (((int)cob_cmp_s64 (b_10, (*(cob_s64_ptr)(b_11))) == 0))
{
/* Line: 24 : ADD : tstspd.bat */
cob_add (&f_9, &f_8, 0);
/* Line: 25 : MOVE : tstspd.bat */
memcpy (b_10, b_12, 8);
}
else
{
/* ELSE */
/* Line: 27 : SUBTRACT : tstspd.bat */
cob_sub (&f_9, &f_8, 0);
/* Line: 28 : MOVE : tstspd.bat */
memcpy (b_10, b_11, 8);
}
(*(cob_s64_ptr)(b_8)) = ((*(cob_s64_ptr)(b_8)) + 1);
}
On Wed, Dec 30, 2020 at 10:25 AM Christian Lademann (ZLS) <lademann@xxxxxx> wrote:
Hello Simon,
thank you for these tips.
I've looked at the generated C code. The "killer" seems to be
cob_move(), which is obviously necessary to move data between different
variable types and structures. In this case it's the moves "move 1 to k"
and "move 0 to k". Since source and destination have differing
structures (pic 9 comp <--> constant 0/1) cob_move() does the job.
But a simple modification of the program changes the timing dramatically:
---(tspeed3.cob)---
identification division.
program-id. tspeed3.
data division.
working-storage section.
01 i pic 9(7) comp-5 value 0.
01 j pic s9(14) comp-5 value 0.
01 k pic 9 comp-5 value 0.
01 k0 pic 9 comp-5 value 0.
01 k1 pic 9 comp-5 value 1.
PROCEDURE DIVISION.
move 0 to j, k
perform varying i from 0 by 1 until i >= 10000000
if k = k0
add i to j
move k1 to k
else
subtract i from j
move k0 to k
end-if
end-perform
display j upon console
stop run.
---
$ time ./tspeed3
-00000000000005000000
real 0m0.076s
user 0m0.070s
sys 0m0.003s
---
$ time ./tspeed2
-00000000000005000000
real 0m0.382s
user 0m0.380s
sys 0m0.000s
---
Now the compiler generates a direct assignment insted of a cob_move().
The same could go for the comparison ("if k = 0" <--> "if k = k0") as
well: In both cases a "cob_cmp_u8()" is generated but in the latter case
a direct comparison would be possible, just like the direct assignment
is possible (and way faster).
Wouldn't it be possible to optimize these cases at compile time?
-fnotrunc does this boost as well here, but it might be "too invasive"
for legacy code, though.
Thank you very much.
Best regards,
Christian
Am 30.12.20 um 15:27 schrieb Simon Sobisch:
> Am 30.12.2020 um 11:48 schrieb Christian Lademann (ZLS):
>> Hello gnucobol team
>>
>> first of all let me point out how happy I am that there is an open
>> source COBOL implementation that is impressing capable and compatible
>> and growing from release to release. I've been experimenting with
>> open-cobol and gnucobol for quite some time now.
>
> Thanks. We're happy to be able to provide GnuCOBOL :-)
>
>> While writing some test programs for comparing computational speed, I
>> got the impression that there might be some bottlenecks.
>
> Of course there are differences as the rules for arithmetic differ
> between C and COBOL.
>
>> For example:
>>
>> ---(tspeed2.cob)---
>>
>> identification division.
>> program-id. tspeed2.
>> data division.
>> working-storage section.
>>
>> 01 i pic 9(7) comp-5 value 0.
>> 01 j pic s9(14) comp-5 value 0.
>> 01 k pic 9 comp-5 value 0.
>>
>> procedure division.
>>
>> move 0 to j, k
>> perform varying i from 0 by 1 until i >= 10000000
>> if k = 0
>> add i to j
>> move 1 to k
>> else
>> subtract i from j
>> move 0 to k
>> end-if
>> end-perform
>>
>> display j upon console
>> stop run.
>> ---
>>
>> $ cobc -x -O2 tspeed2.cob
>> $ time ./tspeed2
>> -000000005000000
>>
>> real 0m0.373s
>> user 0m0.367s
>> sys 0m0.003s
>>
>>
>> ---(c-tspeed2.c)---
>>
>> include <stdio.h>
>>
>> main() {
>> long i, j;
>> int k;
>>
>> j = 0;
>> k = 0;
>>
>> for(i = 0; i < 10000000; i++) {
>> if(k)
>> j -= i;
>> else
>> j += i;
>>
>> k = !k;
>> }
>>
>> printf("j=%ld\n", j);
>> }
>>
>> ---
>>
>> $ make c-tspeed2
>> $ time ./c-tspeed2
>> j=-5000000
>>
>> real 0m0.027s
>> user 0m0.027s
>> sys 0m0.000s
>>
>> The C variant of this benchmark is more than 10 times faster!
>>
>> I know that this just a very limited-scoped look at performance and
>> maybe the comparison is unfair.
>
> That's actually a reasonable test, the only "unfair" part is that you
> have a very similar test but COBOL "works different" so you're testing
> different things ;-)
>
>> But I might be missing something.
>>
>> Are there other performance tweaks, besides "-O2"?
>
> To get the fastest COBOL: ensure that GnuCOBOL itself is heavy optimized:
>
> * install most current C compiler available
>
> * build GMP (or MPIR) from source, this normally will produce a library
> that is bound to at least the current CPU features (after changing the
> CPU you'd want to redo that part)
>
> * get most current GnuCOBOL version, build it from source with
> CFLAGS="-O2 -march=native" and pointing to the installed libgmp/libmpir
>
> That will normally provide you with a better overall speed, in most
> cases more than cobc -O2 will provide.
>
> cobc -O2 something.cob always lead to longer compilation times, but the
> generated modules are often not much faster Than with -O; -Os often does
> make a difference in the size of the module; -O0 is sometimes necessary
> if you compile big sources and cobc has to use an outdated compiler (or
> new MSVC...).
>
> The biggest difference you'll normally see for heavy arithmetic is when
> you explicit disable ANSI truncation rules (also true for other COBOL
> compilers).
>
> cobc -x -fnotrunc tspeed2.cob
>
> will give you a much faster result (around 2 times of the C program and
> actually _does_ nearly the same thing as your C program).
>
> Note: in any case COBOL modules always have the additional workload of
> initializing the GnuCOBOL runtime - which on this machine here takes
> nearly the same time as the complete C program... - so the _actual_
> workload of the COBOL module with -fnotrunc is nearly identical to the C
> program :-)
>
> To calculate the time needed for initializing the runtime: compile and
> profile an "empty" COBOL program.
>
> As soon as a COBOL program does "common" io that additional startup time
> does not really count, but for using COBOL as scripting language (which
> is possible with cobc -xj) it would be reasonable to provide a switch to
> prevent most of the runtime initialization (at least the parts that
> check the process environment).
>
>> Thank you very much.
>>
>> Best regards,
>> Christian.
>
> You're welcome,
> Simon
>
> .
--
* Christian Lademann, ZLS Software GmbH mailto:lademann@xxxxxx
*
* ZLS Software GmbH
* Frankfurter Straße 59 Telefon +49-6195-9902-0 mailto:zls@xxxxxx
* D-65779 Kelkheim Telefax +49-6195-900600 http://www.zls.de
*
* Geschäftsführung John A. Shuter
* Handelsregister Amtsgericht Königstein, HRB 3105
Cheers
Ron Norman