On 25/11/2019 08:45, fergtm@xxxxxxxxxxx wrote: > Sorry to bring this up again but I really don't know how to fix. I already > re-wrote my code to use SSL_read/SSL_write instead of a SSL filter BIO but I > still get the same error. > > I can reproduce when the sender is nginx, socat openssl-listen or openssl > s_server. Both the server and client are running in the same machine. > > The SSL object is not using a socket BIO instead I use a BIO pair. I may be > using the BIO pair incorrectly but I haven't found any complete examples on > how to use them. > > It works perfectly if I use a debug build of OpenSSL This suggests it *could* be a compiler bug. You might want to experiment with different optimization levels to see if that makes a difference. Matt > > Thanks > > -----Original Message----- > From: openssl-users <openssl-users-bounces@xxxxxxxxxxx> On Behalf Of > Fernando Gutierrez Mendez > Sent: Monday, November 18, 2019 2:34 PM > To: openssl-users@xxxxxxxxxxx > Subject: Re: ssl3_get_record:decryption failed on some machines > > The writer is my own code but I can also reproduce the problem when server > is nginx and client is my app. > > In my code I do not use OpenSSL socket BIOs instead I do read/writes through > a BIO pair: > > pairBase = BIO_new(BIO_s_bio()); > pairInt = BIO_new(BIO_s_bio()); > > [...] > > BIO_make_bio_pair(pairBase, pairInt); > > [...] > > sslBIO = BIO_new_ssl(ssl_ctx, 1 /* Client */); > > [...] > > BIO_push(sslBIO, pairInt); > > After each BIO_read/BIO_write to sslBIO I read/write any available data from > the network to pairBase. > > I think I'm handling partial writes correctly: > > SSL_CTX_set_mode(ssl_ctx, SSL_MODE_AUTO_RETRY | > SSL_MODE_ENABLE_PARTIAL_WRITE | SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER); > > [..] > > ret = BIO_write(sslBIO, buf, (int)length); > > if (ret <= 0 && !BIO_should_retry(sslBIO)) > { > /* Handle error */ > return; > } > > if (ret > 0) > { > buf = ((uint8_t *)buf) + (size_t)ret; > length -= (size_t)ret; > } > > but again the problem reproduces even if the writer is nginx. > > Thanks > > On Mon, Nov 18, 2019 at 02:19:30PM -0500, Viktor Dukhovni wrote: >>> On Nov 18, 2019, at 1:44 PM, Fernando Gutierrez Mendez > <fergtm@xxxxxxxxxxx> wrote: >>> >>> I use non-blocking IO with a SSL BIO so a call to BIO_read eventually > returns -1, when this happens I call BIO_should_retry to test if this is due > an error or because of the underlying non-blocking transport. >> >> Is the writer side also non-blocking? Is it your own code? >> >>> This code works correctly but after transferring between 1Mb to 5Mb (it > varies every time) BIO_should_rety returns false and SSL_get_error returns > SSL_ERROR_SSL. The error is "139964546914112:error:1408F119:SSL > routines:ssl3_get_record:decryption failed or bad record > mac:../ssl/record/ssl3_record.c:677" >> >> One way to get decryption integrity failure is for a non-blocking >> writer to not handle partial writes correctly, if on an incomplete >> write the writer resends the whole buffer, rather than only what it >> failed to send last time, the TCP stream ends up stuttering >> ciphertext, and the reader sees data integrity errors. >> >> This can be seen by looking for unexpected runs of repeated ciphertext >> in a PCAP capture of the data. >> >> Whether the data sent to a particular reader ever ends up blocked at >> the TCP layer for a given writer can depend on various network-layer >> issues making some machines more prone to problems than others. >> >> -- >> Viktor. >> >