Re: Speed...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If you define all of your variables as BINARY-C-LONG then the generated code is closer to your C code.

  l_2:;
  /* Line: 21        : MOVE               : tstspd.bat */
  (*(cob_s64_ptr)(b_9)) = 0;
  (*(cob_s64_ptr)(b_10)) = 0;
  /* Line: 22        : PERFORM            : tstspd.bat */
  (*(cob_s64_ptr)(b_8)) = 0;
  for (;;)
  {
    if ( ((*(cob_s64_ptr)(b_8)) >= 10000000) )
      break;
    /* Line: 23        : IF                 : tstspd.bat */
    if (((int)cob_cmp_s64 (b_10, (*(cob_s64_ptr)(b_11))) == 0))
    {
      /* Line: 24        : ADD                : tstspd.bat */
      cob_add (&f_9, &f_8, 0);
      /* Line: 25        : MOVE               : tstspd.bat */
      memcpy (b_10, b_12, 8);
    }
    else
    {
      /* ELSE */
      /* Line: 27        : SUBTRACT           : tstspd.bat */
      cob_sub (&f_9, &f_8, 0);
      /* Line: 28        : MOVE               : tstspd.bat */
      memcpy (b_10, b_11, 8);
    }
    (*(cob_s64_ptr)(b_8)) = ((*(cob_s64_ptr)(b_8)) + 1);
  }                                                                 


On Wed, Dec 30, 2020 at 10:25 AM Christian Lademann (ZLS) <lademann@xxxxxx> wrote:
Hello Simon,

thank you for these tips.

I've looked at the generated C code. The "killer" seems to be
cob_move(), which is obviously necessary to move data between different
variable types and structures. In this case it's the moves "move 1 to k"
and "move 0 to k". Since source and destination have differing
structures (pic 9 comp <--> constant 0/1) cob_move() does the job.
But a simple modification of the program changes the timing dramatically:

---(tspeed3.cob)---

        identification division.
        program-id. tspeed3.

        data division.

        working-storage section.

        01 i pic 9(7) comp-5 value 0.
        01 j pic s9(14) comp-5 value 0.
        01 k  pic 9 comp-5 value 0.
        01 k0 pic 9 comp-5 value 0.
        01 k1 pic 9 comp-5 value 1.

        PROCEDURE DIVISION.

        move 0 to j, k
        perform varying i from 0 by 1 until i >= 10000000
          if k = k0
            add i to j
            move k1 to k
          else
            subtract i from j
            move k0 to k
          end-if
        end-perform

        display j upon console
        stop run.

---

$ time ./tspeed3
-00000000000005000000

real    0m0.076s
user    0m0.070s
sys     0m0.003s

---
$ time ./tspeed2
-00000000000005000000

real    0m0.382s
user    0m0.380s
sys     0m0.000s
---

Now the compiler generates a direct assignment insted of a cob_move().

The same could go for the comparison ("if k = 0" <--> "if k = k0") as
well: In both cases a "cob_cmp_u8()" is generated but in the latter case
a direct comparison would be possible, just like the direct assignment
is possible (and way faster).

Wouldn't it be possible to optimize these cases at compile time?

-fnotrunc does this boost as well here, but it might be "too invasive"
for legacy code, though.

Thank you very much.

Best regards,
Christian



Am 30.12.20 um 15:27 schrieb Simon Sobisch:
> Am 30.12.2020 um 11:48 schrieb Christian Lademann (ZLS):
>> Hello gnucobol team
>>
>> first of all let me point out how happy I am that there is an open
>> source COBOL implementation that is impressing capable and compatible
>> and growing from release to release. I've been experimenting with
>> open-cobol and gnucobol for quite some time now.
>
> Thanks. We're happy to be able to provide GnuCOBOL :-)
>
>> While writing some test programs for comparing computational speed, I
>> got the impression that there might be some bottlenecks.
>
> Of course there are differences as the rules for arithmetic differ
> between C and COBOL.
>
>> For example:
>>
>> ---(tspeed2.cob)---
>>
>>         identification division.
>>         program-id. tspeed2.
>>         data division.
>>         working-storage section.
>>
>>         01 i pic 9(7) comp-5   value 0.
>>         01 j pic s9(14) comp-5 value 0.
>>         01 k pic 9 comp-5      value 0.
>>
>>         procedure division.
>>
>>         move 0 to j, k
>>         perform varying i from 0 by 1 until i >= 10000000
>>           if k = 0
>>             add i to j
>>             move 1 to k
>>           else
>>             subtract i from j
>>             move 0 to k
>>           end-if
>>         end-perform
>>
>>         display j upon console
>>         stop run.
>> ---
>>
>> $ cobc -x -O2 tspeed2.cob
>> $ time ./tspeed2
>> -000000005000000
>>
>> real    0m0.373s
>> user    0m0.367s
>> sys    0m0.003s
>>
>>
>> ---(c-tspeed2.c)---
>>
>> include <stdio.h>
>>
>> main() {
>>      long i, j;
>>      int k;
>>
>>      j = 0;
>>      k = 0;
>>
>>      for(i = 0; i < 10000000; i++) {
>>          if(k)
>>              j -= i;
>>          else
>>              j += i;
>>
>>          k = !k;
>>      }
>>
>>      printf("j=%ld\n", j);
>> }
>>
>> ---
>>
>> $ make c-tspeed2
>> $ time ./c-tspeed2
>> j=-5000000
>>
>> real    0m0.027s
>> user    0m0.027s
>> sys    0m0.000s
>>
>> The C variant of this benchmark is more than 10 times faster!
>>
>> I know that this just a very limited-scoped look at performance and
>> maybe the comparison is unfair.
>
> That's actually a reasonable test, the only "unfair" part is that you
> have a very similar test but COBOL "works different" so you're testing
> different things ;-)
>
>> But I might be missing something.
>>
>> Are there other performance tweaks, besides "-O2"?
>
> To get the fastest COBOL: ensure that GnuCOBOL itself is heavy optimized:
>
> * install most current C compiler available
>
> * build GMP (or MPIR) from source, this normally will produce a library
> that is bound to at least the current CPU features (after changing the
> CPU you'd want to redo that part)
>
> * get most current GnuCOBOL version, build it from source with
> CFLAGS="-O2 -march=native" and pointing to the installed libgmp/libmpir
>
> That will normally provide you with a better overall speed, in most
> cases more than cobc -O2 will provide.
>
> cobc -O2 something.cob always lead to longer compilation times, but the
> generated modules are often not much faster Than with -O; -Os often does
> make a difference in the size of the module; -O0 is sometimes necessary
> if you compile big sources and cobc has to use an outdated compiler (or
> new MSVC...).
>
> The biggest difference you'll normally see for heavy arithmetic is when
> you explicit disable ANSI truncation rules (also true for other COBOL
> compilers).
>
> cobc -x -fnotrunc tspeed2.cob
>
> will give you a much faster result (around 2 times of the C program and
> actually _does_ nearly the same thing as your C program).
>
> Note: in any case COBOL modules always have the additional workload of
> initializing the GnuCOBOL runtime - which on this machine here takes
> nearly the same time as the complete C program... - so the _actual_
> workload of the COBOL module with -fnotrunc is nearly identical to the C
> program :-)
>
> To calculate the time needed for initializing the runtime: compile and
> profile an "empty" COBOL program.
>
> As soon as a COBOL program does "common" io that additional startup time
> does not really count, but for using COBOL as scripting language (which
> is possible with cobc -xj) it would be reasonable to provide a switch to
> prevent most of the runtime initialization (at least the parts that
> check the process environment).
>
>> Thank you very much.
>>
>> Best regards,
>> Christian.
>
> You're welcome,
> Simon
>
> .


--
*  Christian Lademann, ZLS Software GmbH         mailto:lademann@xxxxxx
*
*  ZLS Software GmbH
*  Frankfurter Straße 59    Telefon +49-6195-9902-0   mailto:zls@xxxxxx
*  D-65779 Kelkheim         Telefax +49-6195-900600   http://www.zls.de
*
*  Geschäftsführung John A. Shuter
*  Handelsregister  Amtsgericht Königstein, HRB 3105



--
Cheers
Ron Norman

[Index of Archives]     [Gcc Help]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Info]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux