On 4/29/2021 8:25 AM, Rob Emery wrote:
Hello, We have a problem where intermittently users are getting a plaintext 400 Bad Request response in the middle of the TLS handshake (always the 6th packet in the TCP stream); it happens about 1 in 40K requests at current. As far as we can tell, there is no difference between a successful connection from a client and these failures (we have confirmed that all the options in the handshake are identical apart from the session/random level components). We have traffic captures of the problem occurring (see attached screenshot with the end-user’s IP redacted) and it happens fairly frequently for us in our production environment (about 10 per hour or similar). We've examined the user agents etc that those requests usually come from and see a mixture of different types of clients (PHP + Curl, Firefox, Chrome, Safari, Java, Python) and Operating Systems (iOS, Linux, Windows 10, Android) etc, so there doesn't appear to be any commonality between the clients. There’s a firewall performing NAT between the client and the httpd instance and the error is definitely coming from httpd as the traffic captures were taken on the physical interface that httpd is listening on. It is happening on multiple (> 5) servers that share nothing so we don’t think it could be a physical issue. They’re apache2 2.4.25-3+deb9u7 on Debian 9. This is 2 minor patches behind the latest however we have reviewed the patches and there doesn’t seem to be any way those changes could affect this behaviour. We have also read through the changelog for Apache2, the only possible related change that we can see is in 2.4.38: *) mod_ssl: Fix the error code returned in an error path of 'ssl_io_filter_handshake()'. This messes-up error handling performed in 'ssl_io_filter_error()' [Yann Ylavic] However that change only resolves a situation where httpd returns a 502 when it should return a 400, so we don’t think that’s related. We spent a good portion of yesterday reviewing the mod_ssl code, however we weren't able to identify a situation where this would happen. We have logging at “warn” everywhere, however these requests don’t show in either the access or error log when we check for them. We are currently trying to get this reproduced in a lab environment so we can increase the log levels etc however any guidance as to where to focus our efforts would be much appreciated. Thanks Rob Other relevant information we can think of: apache2 2.4.25-3+deb9u7 Linux 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1 (2019-04-12) x86_64 GNU/Linux openssl 1.1.0j-1~deb9u1 We’re using: mod_ssl and mpm_worker with: StartServers 2 MinSpareThreads 25 MaxSpareThreads 75 ThreadLimit 64 ThreadsPerChild 25 MaxRequestWorkers 150 MaxConnectionsPerChild 0 Other modules we have enabled are: access_compat.load alias.load auth_basic.load authn_core.load authn_file.load authz_core.load authz_host.load authz_user.load deflate.load dir.load env.load filter.load headers.load lbmethod_byrequests.load mime.load negotiation.load proxy_balancer.load proxy_html.load proxy_http.load proxy.load rewrite.load setenvif.load slotmem_shm.load socache_shmcb.load status.load Xml2enc.load Example of the site (edited for brevity): <VirtualHost 10.1.17.209:443> ServerName example.com ErrorLog ${APACHE_LOG_DIR}/example.com.error.log LogLevel warnCustomLog ${APACHE_LOG_DIR}/example.com.access.log vhost_combined_cw_tls env=!dontlog#Enable mod-deflate for everything except images SetOutputFilter DEFLATE SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ no-gzip RewriteEngine OnRewriteCond %{REQUEST_METHOD} !^(GET|HEAD|OPTIONS|POST|PUT|DELETE|PATCH)$ [NC]RewriteRule .* "-" [F] RequestHeader unset X-Forwarded-For RequestHeader unset X-Forwarded-Host RequestHeader unset X-Forwarded-Proto RequestHeader unset Max-Forwards SSLEngine On SSLProtocol all -SSLv3 -TLSv1 -TLSv1.1 SSLProxyEngine on SSLProxyVerify none SSLHonorCipherOrder on # MSIE 2-6BrowserMatch "MSIE [2-6]" nokeepalive ssl-unclean-shutdown downgrade-1.0 force-response-1.0# MSIE 7 and newer should be able to use keepalive BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown RequestHeader set X-ClientSSLProtocol "%{SSL_PROTOCOL}s" RequestHeader set X-ClientSSLCipher "%{SSL_CIPHER}s" RequestHeader set X_FORWARDED_PROTO "https" RequestHeader set X-Forwarded-Proto "https" SSLCertificateFile/etc/apache2/ssl/example.com/_.example.com.crt SSLCertificateKeyFile/etc/apache2/ssl/example.com/_.example.com.key SSLCertificateChainFile /etc/apache2/ssl/example.com/chain.crt RewriteRule ^(.*)$http://upstreamserver/$1 [P,QSA] </VirtualHost>
Can you tell from the user agents if they indicate old clients? I see you disable TLSv1.1.You could try temporarily enabling TLSv1.1 and see if those failures stop and then you would know it's probably old clients that can't talk TLSv1.2. Although the percentage of failures likely doesn't fit that.
If not already included, you could include %{SSL_PROTOCOL}x %{SSL_CIPHER}x in your request log and see if there is any commonality in requests assuming the communication is open long enough for the logging to occur or if the client's desired protocol and cipher might get listed.
Jim --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx