Re: Intermittently the TLS handshake results in plaintext 400 Bad Request response

Jim Albert <jim@xxxxxxxxxxxxx> · Thu, 29 Apr 2021 08:58:13 -0400

On 4/29/2021 8:25 AM, Rob Emery wrote:

Hello,

We have a problem where intermittently users are getting a plaintext
400 Bad Request response in the middle of the TLS handshake (always
the 6th packet in the TCP stream); it happens about 1 in 40K requests
at current. As far as we can tell, there is no difference between a
successful connection from a client and these failures (we have
confirmed that all the options in the handshake are identical apart
from the session/random level components).

We have traffic captures of the problem occurring (see attached
screenshot with the end-user’s IP redacted) and it happens fairly
frequently for us in our production environment (about 10 per hour or
similar).

We've examined the user agents etc that those requests usually come
from and see a mixture of different types of clients (PHP + Curl,
Firefox, Chrome, Safari, Java, Python) and Operating Systems (iOS,
Linux, Windows 10, Android) etc, so there doesn't appear to be any
commonality between the clients.

There’s a firewall performing NAT between the client and the httpd
instance and the error is definitely coming from httpd as the traffic
captures were taken on the physical interface that httpd is listening
on. It is happening on multiple (> 5) servers that share nothing so we
don’t think it could be a physical issue.

They’re apache2 2.4.25-3+deb9u7 on Debian 9. This is 2 minor patches
behind the latest however we have reviewed the patches and there
doesn’t seem to be any way those changes could affect this behaviour.
We have also read through the changelog for Apache2, the only possible
related change that we can see is in 2.4.38:

 *) mod_ssl: Fix the error code returned in an error path of
     'ssl_io_filter_handshake()'. This messes-up error handling performed
     in 'ssl_io_filter_error()' [Yann Ylavic]

However that change only resolves a situation where httpd returns a
502 when it should return a 400, so we don’t think that’s related. We
spent a good portion of yesterday reviewing the mod_ssl code, however
we weren't able to identify a situation where this would happen.

We have logging at “warn” everywhere, however these requests don’t
show in either the access or error log when we check for them.

We are currently trying to get this reproduced in a lab environment so
we can increase the log levels etc however any guidance as to where to
focus our efforts would be much appreciated.

Thanks
Rob

Other relevant information we can think of:

apache2 2.4.25-3+deb9u7
Linux 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1 (2019-04-12) x86_64 GNU/Linux
openssl 1.1.0j-1~deb9u1

We’re using: mod_ssl and mpm_worker with:

    StartServers            2
    MinSpareThreads        25
    MaxSpareThreads        75
    ThreadLimit            64
    ThreadsPerChild        25
    MaxRequestWorkers        150
    MaxConnectionsPerChild    0

Other modules we have enabled are:

access_compat.load
alias.load
auth_basic.load
authn_core.load
authn_file.load
authz_core.load
authz_host.load
authz_user.load
deflate.load
dir.load
env.load
filter.load
headers.load
lbmethod_byrequests.load
mime.load
negotiation.load
proxy_balancer.load
proxy_html.load
proxy_http.load
proxy.load
rewrite.load
setenvif.load
slotmem_shm.load
socache_shmcb.load
status.load
Xml2enc.load

Example of the site (edited for brevity):

<VirtualHost 10.1.17.209:443>
    ServerName example.com

    ErrorLog ${APACHE_LOG_DIR}/example.com.error.log
    LogLevel warn

    CustomLog ${APACHE_LOG_DIR}/example.com.access.log 

vhost_combined_cw_tls env=!dontlog

    #Enable mod-deflate for everything except images
    SetOutputFilter DEFLATE
    SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ no-gzip

    RewriteEngine On

    RewriteCond %{REQUEST_METHOD} 

!^(GET|HEAD|OPTIONS|POST|PUT|DELETE|PATCH)$ [NC]

    RewriteRule .* "-" [F]

    RequestHeader unset X-Forwarded-For
    RequestHeader unset X-Forwarded-Host
    RequestHeader unset X-Forwarded-Proto
    RequestHeader unset Max-Forwards

    SSLEngine On
    SSLProtocol all -SSLv3 -TLSv1 -TLSv1.1
    SSLProxyEngine on
    SSLProxyVerify none
    SSLHonorCipherOrder on

    # MSIE 2-6

    BrowserMatch "MSIE [2-6]" nokeepalive ssl-unclean-shutdown 

downgrade-1.0 force-response-1.0

    # MSIE 7 and newer should be able to use keepalive
    BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown

    RequestHeader set X-ClientSSLProtocol "%{SSL_PROTOCOL}s"
    RequestHeader set X-ClientSSLCipher "%{SSL_CIPHER}s"
    RequestHeader set X_FORWARDED_PROTO "https"
    RequestHeader set X-Forwarded-Proto "https"

SSLCertificateFile/etc/apache2/ssl/example.com/_.example.com.crt
SSLCertificateKeyFile/etc/apache2/ssl/example.com/_.example.com.key
    SSLCertificateChainFile /etc/apache2/ssl/example.com/chain.crt

    RewriteRule ^(.*)$http://upstreamserver/$1 [P,QSA]

</VirtualHost>

Can you tell from the user agents if they indicate old clients?
I see you disable TLSv1.1.

You could try temporarily enabling TLSv1.1 and see if those failures 

stop and then you would know it's probably old clients that can't talk 

TLSv1.2. Although the percentage of failures likely doesn't fit that.

If not already included, you could include %{SSL_PROTOCOL}x 

%{SSL_CIPHER}x in your request log and see if there is any commonality 

in requests assuming the communication is open long enough for the 

logging to occur or if the client's desired protocol and cipher might 

get listed.

Jim

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx