BN_MUL_MONT for ARM64 v8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



  Is big number montogomery multiplication as optimized as it can be for ARM64 as compared to X86-64 from the latest openssl github ?
  We are not seeing vmull ( or pmull/pmull2) instructions in armv8-mont.pl.      
  
   On an ARM cortex-A72 (1GHz)  and E5-2620 (2.1 Ghz)  we are seeing an order of 10 difference in RSA signing perf for 2048 bit keys.


  Ran

          openssl speed rsa2048


Here are the openssl speed numbers.

x86-64

[root@nuosrv2 openssl]# ./apps/openssl speed rsa2048 
Doing 2048 bit private rsa's for 10s: 13134 2048 bit private RSA's in 9.97s
Doing 2048 bit public rsa's for 10s: 379019 2048 bit public RSA's in 9.98s
OpenSSL 1.1.1-dev  xx XXX xxxx
built on: reproducible build, date unspecified
options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPADLOCK_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib64/engines-1.1\""  -Wa,--noexecstack
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.000759s 0.000026s   1317.4  37977.9


arm64:

[root@juno openssl]# ./apps/openssl speed rsa2048
Doing 2048 bit private rsa's for 10s: 1319 2048 bit private RSA's in 9.92s
Doing 2048 bit public rsa's for 10s: 49209 2048 bit public RSA's in 9.93s
OpenSSL 1.1.1-dev  xx XXX xxxx
built on: reproducible build, date unspecified
options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -Wa,--noexecstack
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.007521s 0.000202s    133.0   4955.6



    ARM64 heavy hitters

  

    69.70%  openssl        libcrypto.so.1.1         [.] __bn_sqr8x_mont
    18.64%  openssl        libcrypto.so.1.1         [.] __bn_mul4x_mont
     4.92%  openssl        libcrypto.so.1.1         [.] MOD_EXP_CTIME_COPY_FROM_PREBUF
     1.50%  openssl        libcrypto.so.1.1         [.] bn_mul_add_words


    x86-64 heavy hitters

          

    30.93%  openssl          libcrypto.so.1.1         [.] __bn_sqrx8x_reduction
    17.65%  openssl          libcrypto.so.1.1         [.] bn_sqrx8x_internal
    12.65%  openssl          libcrypto.so.1.1         [.] mulx4x_internal
     8.91%  openssl          libcrypto.so.1.1         [.] bn_mul_add_words
     7.14%  openssl          libcrypto.so.1.1         [.] bn_mulx4x_mont


Code looks different between x86 and ARM64. Is it due to the ISA or ARM64 not yet catching up with
super efficient X86-64.

Basically are we stuck with 1:5 (if we extrapolate A72 to 2Ghz) or is there an optimal code that
we need to pick up for ARM64.  I compiled openssl from github (latest).





Any pointers will be extremely helpful.


Thanks,
-vijay
-- 
openssl-users mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux