Re: glusterfsd memory leak issue found after enable ssl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi 'Milind Changire' ,

The leak is getting more and more clear to me now. the unsolved memory leak is because of in gluterfs version 3.12.15 (in my env)the ssl context is a shared one, while we do ssl_acept, ssl will allocate some read/write buffer to ssl object, however, ssl_free in socket_reset or fini function of socket.c, the buffer is returened back to ssl context free list instead of completely freed.

 

So following patch is able to fix the memory leak issue completely.(created for gluster master branch)

 

--- a/rpc/rpc-transport/socket/src/socket.c
+++ b/rpc/rpc-transport/socket/src/socket.c
@@ -446,6 +446,7 @@ ssl_setup_connection_postfix(rpc_transport_t *this)
     gf_log(this->name, GF_LOG_DEBUG,
            "SSL verification succeeded (client: %s) (server: %s)",
            this->peerinfo.identifier, this->myinfo.identifier);
+    X509_free(peer);
     return gf_strdup(peer_CN);

     /* Error paths. */
@@ -1157,7 +1158,21 @@ __socket_reset(rpc_transport_t *this)
     memset(&priv->incoming, 0, sizeof(priv->incoming));

     event_unregister_close(this->ctx->event_pool, priv->sock, priv->idx);
-
+    if(priv->use_ssl&& priv->ssl_ssl)
+    {
+      gf_log(this->name, GF_LOG_TRACE,
+             "clear and reset for socket(%d), free ssl ",
+             priv->sock);
+               if(priv->ssl_ctx)
+                 {
+                       SSL_CTX_free(priv->ssl_ctx);
+                       priv->ssl_ctx = NULL;
+                 }
+      SSL_shutdown(priv->ssl_ssl);
+      SSL_clear(priv->ssl_ssl);
+      SSL_free(priv->ssl_ssl);
+      priv->ssl_ssl = NULL;
+    }
     priv->sock = -1;
     priv->idx = -1;
     priv->connected = -1;
@@ -4675,6 +4690,21 @@ fini(rpc_transport_t *this)
         pthread_mutex_destroy(&priv->out_lock);
         pthread_mutex_destroy(&priv->cond_lock);
         pthread_cond_destroy(&priv->cond);
+               if(priv->use_ssl&& priv->ssl_ssl)
+               {
+                 gf_log(this->name, GF_LOG_TRACE,
+                                "clear and reset for socket(%d), free ssl ",
+                                priv->sock);
+                 if(priv->ssl_ctx)
+                 {
+                       SSL_CTX_free(priv->ssl_ctx);
+                       priv->ssl_ctx = NULL;
+                 }
+                 SSL_shutdown(priv->ssl_ssl);
+                 SSL_clear(priv->ssl_ssl);
+                 SSL_free(priv->ssl_ssl);

From: Zhou, Cynthia (NSB - CN/Hangzhou)
Sent: Monday, May 06, 2019 2:12 PM
To: 'Amar Tumballi Suryanarayan' <atumball@xxxxxxxxxx>
Cc: 'Milind Changire' <mchangir@xxxxxxxxxx>; 'gluster-devel@xxxxxxxxxxx' <gluster-devel@xxxxxxxxxxx>
Subject: RE: [Gluster-devel] glusterfsd memory leak issue found after enable ssl

 

Hi,

From our test valgrind and libleak all blame ssl3_accept

///////////////////////////from valgrind attached to glusterfds///////////////////////////////////////////

==16673== 198,720 bytes in 12 blocks are definitely lost in loss record 1,114 of 1,123
==16673==    at 0x4C2EB7B: malloc (vg_replace_malloc.c:299)
==16673==    by 0x63E1977: CRYPTO_malloc (in /usr/lib64/libcrypto.so.1.0.2p)
==16673==    by 0xA855E0C: ssl3_setup_write_buffer (in /usr/lib64/libssl.so.1.0.2p)
==16673==    by 0xA855E77: ssl3_setup_buffers (in /usr/lib64/libssl.so.1.0.2p)
==16673==    by 0xA8485D9: ssl3_accept (in /usr/lib64/libssl.so.1.0.2p)
==16673==    by 0xA610DDF: ssl_complete_connection (socket.c:400)
==16673==    by 0xA617F38: ssl_handle_server_connection_attempt (socket.c:2409)
==16673==    by 0xA618420: socket_complete_connection (socket.c:2554)
==16673==    by 0xA618788: socket_event_handler (socket.c:2613)
==16673==    by 0x4ED6983: event_dispatch_epoll_handler (event-epoll.c:587)
==16673==    by 0x4ED6C5A: event_dispatch_epoll_worker (event-epoll.c:663)
==16673==    by 0x615C5D9: start_thread (in /usr/lib64/libpthread-2.27.so)
==16673==
==16673== 200,544 bytes in 12 blocks are definitely lost in loss record 1,115 of 1,123
==16673==    at 0x4C2EB7B: malloc (vg_replace_malloc.c:299)
==16673==    by 0x63E1977: CRYPTO_malloc (in /usr/lib64/libcrypto.so.1.0.2p)
==16673==    by 0xA855D12: ssl3_setup_read_buffer (in /usr/lib64/libssl.so.1.0.2p)
==16673==    by 0xA855E68: ssl3_setup_buffers (in /usr/lib64/libssl.so.1.0.2p)
==16673==    by 0xA8485D9: ssl3_accept (in /usr/lib64/libssl.so.1.0.2p)
==16673==    by 0xA610DDF: ssl_complete_connection (socket.c:400)
==16673==    by 0xA617F38: ssl_handle_server_connection_attempt (socket.c:2409)
==16673==    by 0xA618420: socket_complete_connection (socket.c:2554)
==16673==    by 0xA618788: socket_event_handler (socket.c:2613)
==16673==    by 0x4ED6983: event_dispatch_epoll_handler (event-epoll.c:587)
==16673==    by 0x4ED6C5A: event_dispatch_epoll_worker (event-epoll.c:663)
==16673==    by 0x615C5D9: start_thread (in /usr/lib64/libpthread-2.27.so)
==16673==
valgrind --leak-check=f

 

 

////////////////////////////////////with libleak attached to glusterfsd/////////////////////////////////////////

callstack[2419] expires. count=1 size=224/224 alloc=362 free=350
    /home/robot/libleak/
libleak.so(malloc+0x25) [0x7f1460604065]
    /lib64/
libcrypto.so.10(CRYPTO_malloc+0x58) [0x7f145ecd9978]
    /lib64/
libcrypto.so.10(EVP_DigestInit_ex+0x2a9) [0x7f145ed95749]
    /lib64/
libssl.so.10(ssl3_digest_cached_records+0x11d) [0x7f145abb6ced]
    /lib64/
libssl.so.10(ssl3_accept+0xc8f) [0x7f145abadc4f]
    /usr/lib64/glusterfs/3.12.15/rpc-transport/
socket.so(ssl_complete_connection+0x5e) [0x7f145ae00f3a]
    /usr/lib64/glusterfs/3.12.15/rpc-transport/
socket.so(+0xc16d) [0x7f145ae0816d]
    /usr/lib64/glusterfs/3.12.15/rpc-transport/
socket.so(+0xc68a) [0x7f145ae0868a]
    /usr/lib64/glusterfs/3.12.15/rpc-transport/
socket.so(+0xc9f2) [0x7f145ae089f2]
    /lib64/
libglusterfs.so.0(+0x9b96f) [0x7f146038596f]
    /lib64/
libglusterfs.so.0(+0x9bc46) [0x7f1460385c46]
    /lib64/
libpthread.so.0(+0x75da) [0x7f145f0d15da]
    /lib64/
libc.so.6(clone+0x3f) [0x7f145e9a7eaf]

callstack[2432] expires. count=1 size=104/104 alloc=362 free=0
    /home/robot/libleak/
libleak.so(malloc+0x25) [0x7f1460604065]
    /lib64/
libcrypto.so.10(CRYPTO_malloc+0x58) [0x7f145ecd9978]
    /lib64/
libcrypto.so.10(BN_MONT_CTX_new+0x17) [0x7f145ed48627]
    /lib64/
libcrypto.so.10(BN_MONT_CTX_set_locked+0x6d) [0x7f145ed489fd]
    /lib64/
libcrypto.so.10(+0xff4d9) [0x7f145ed6a4d9]
    /lib64/
libcrypto.so.10(int_rsa_verify+0x1cd) [0x7f145ed6d41d]
    /lib64/
libcrypto.so.10(RSA_verify+0x32) [0x7f145ed6d972]
    /lib64/
libcrypto.so.10(+0x107ff5) [0x7f145ed72ff5]
    /lib64/
libcrypto.so.10(EVP_VerifyFinal+0x211) [0x7f145ed9dd51]
    /lib64/
libssl.so.10(ssl3_get_cert_verify+0x5bb) [0x7f145abac06b]
    /lib64/
libssl.so.10(ssl3_accept+0x988) [0x7f145abad948]
    /usr/lib64/glusterfs/3.12.15/rpc-transport/
socket.so(ssl_complete_connection+0x5e) [0x7f145ae00f3a]
    /usr/lib64/glusterfs/3.12.15/rpc-transport/
socket.so(+0xc16d) [0x7f145ae0816d]
    /usr/lib64/glusterfs/3.12.15/rpc-transport/
socket.so(+0xc68a) [0x7f145ae0868a]
    /usr/lib64/glusterfs/3.12.15/rpc-transport/
socket.so(+0xc9f2) [0x7f145ae089f2]
    /lib64/
libglusterfs.so.0(+0x9b96f) [0x7f146038596f]
    /lib64/
libglusterfs.so.0(+0x9bc46) [0x7f1460385c46]
    /lib64/
libpthread.so.0(+0x75da) [0x7f145f0d15da]
    /lib64/
libc.so.6(clone+0x3f) [0x7f145e9a7eaf]

 

one interesting thing is that the memory goes up to about 300m then it stopped  increasing !!!

I am wondering if this is caused by open-ssl library? But when I search from openssl community, there is no such issue reported before.

Is glusterfs using ssl_accept correctly?

 

cynthia

From: Zhou, Cynthia (NSB - CN/Hangzhou)
Sent: Monday, May 06, 2019 10:34 AM
To: 'Amar Tumballi Suryanarayan' <atumball@xxxxxxxxxx>
Cc: Milind Changire <mchangir@xxxxxxxxxx>; gluster-devel@xxxxxxxxxxx
Subject: RE: [Gluster-devel] glusterfsd memory leak issue found after enable ssl

 

Hi,

Sorry, I am so busy with other issues these days, could you help me to submit my patch for review? It is based on glusterfs3.12.15 code. But even with this patch , memory leak still exists, from memory leak tool it should be related with ssl_accept, not sure if it is because of openssl library or because improper use of ssl interfaces.

--- a/rpc/rpc-transport/socket/src/socket.c

+++ b/rpc/rpc-transport/socket/src/socket.c

@@ -1019,7 +1019,16 @@ static void __socket_reset(rpc_transport_t *this) {

   memset(&priv->incoming, 0, sizeof(priv->incoming));

 

   event_unregister_close(this->ctx->event_pool, priv->sock, priv->idx);

-

+  if(priv->use_ssl&& priv->ssl_ssl)

+  {

+    gf_log(this->name, GF_LOG_INFO,

+           "clear and reset for socket(%d), free ssl ",

+           priv->sock);

+    SSL_shutdown(priv->ssl_ssl);

+    SSL_clear(priv->ssl_ssl);

+    SSL_free(priv->ssl_ssl);

+    priv->ssl_ssl = NULL;

+  }

   priv->sock = -1;

   priv->idx = -1;

   priv->connected = -1;

@@ -4238,6 +4250,16 @@ void fini(rpc_transport_t *this) {

     pthread_mutex_destroy(&priv->out_lock);

     pthread_mutex_destroy(&priv->cond_lock);

     pthread_cond_destroy(&priv->cond);

+     if(priv->use_ssl&& priv->ssl_ssl)

+  {

+    gf_log(this->name, GF_LOG_INFO,

+           "clear and reset for socket(%d), free ssl ",

+           priv->sock);

+    SSL_shutdown(priv->ssl_ssl);

+    SSL_clear(priv->ssl_ssl);

+    SSL_free(priv->ssl_ssl);

+    priv->ssl_ssl = NULL;

+  }

     if (priv->ssl_private_key) {

       GF_FREE(priv->ssl_private_key);

     }

 

 

From: Amar Tumballi Suryanarayan <atumball@xxxxxxxxxx>
Sent: Wednesday, May 01, 2019 8:43 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou@xxxxxxxxxxxxxxx>
Cc: Milind Changire <mchangir@xxxxxxxxxx>; gluster-devel@xxxxxxxxxxx
Subject: Re: [Gluster-devel] glusterfsd memory leak issue found after enable ssl

 

Hi Cynthia Zhou,

 

Can you post the patch which fixes the issue of missing free? We will continue to investigate the leak further, but would really appreciate getting the patch which is already worked on land into upstream master.

 

-Amar

 

On Mon, Apr 22, 2019 at 1:38 PM Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou@xxxxxxxxxxxxxxx> wrote:

Ok, I am clear now.

Ive added ssl_free in socket reset and socket finish function, though glusterfsd memory leak is not that much, still it is leaking, from source code I can not find anything else,

Could you help to check if this issue exists in your env? If not I may have a try to merge your patch .

Step

1>   while true;do gluster v heal <vol-name> info,

2>   check the vol-name glusterfsd memory usage, it is obviously increasing.

cynthia

 

From: Milind Changire <mchangir@xxxxxxxxxx>
Sent: Monday, April 22, 2019 2:36 PM
To: Zhou, Cynthia (NSB - CN/Hangzhou) <cynthia.zhou@xxxxxxxxxxxxxxx>
Cc: Atin Mukherjee <amukherj@xxxxxxxxxx>; gluster-devel@xxxxxxxxxxx
Subject: Re: [Gluster-devel] glusterfsd memory leak issue found after enable ssl

 

According to BIO_new_socket() man page ...

 

If the close flag is set then the socket is shut down and closed when the BIO is freed.

 

For Gluster to have more control over the socket shutdown, the BIO_NOCLOSE flag is set. Otherwise, SSL takes control of socket shutdown whenever BIO is freed.

 

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel


 

--

Amar Tumballi (amarts)

_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux