Re: Securing Ceph with TLS

Kadir Ozdemir <kozdemir@xxxxxxxxxxxxxx> · Mon, 12 Mar 2018 16:54:03 -0700

Hi Greg,

Thank you for the comments.

I have been recently made aware of msgr2. My knowledge about it comes
from http://docs.ceph.com/docs/master/dev/msgr2/.

If the size of the plaintext data to be transferred over TLS is larger
than 16K, it is divided into chunks. The maximum size of the plaintext
chunk is 16K.  TLS adds a small header to each record, the encryption
algorithms can add some more bytes, e.g., some padding, and also if
compression is enabled, the compression algorithm can add some more
bytes when the data does not compress well. There is no minimum size
for the plaintext data. So, small messages can be encrypted
individually, and sent immediately. This is what our code does
currently.

The OpenSSL library requires to specify if the SSL context would be
used by a client, server, or both (i.e., generic) when the SSL context
is created. A certificate to be used by an SSL context can be of type
client, server, or both. Assuming that not every environment would
support generic certificates, we decided to create separate SSL
contexts,  one for initiating connections (i..e., for the client
role), and the other for accepting connections (for the server role),
even if the same certificate is used for both roles.

After the servers have switched to "required", the clients can stay at
"desired" and it will work. However, it is recommend to switch the
clients to "required" too at some point. Otherwise, the clients would
be vulnerable to man-in-the-middle attacks, where the TLS connections
can be downgraded to the plain TCP connections.

There are two main reasons that our security team prefers separate
ports for TLS connections. The fist one is to eliminate any plaintext
message exchanges required to switch to TLS, and thus, eliminate any
vulnerabilities this may open up because of some software bugs. The
second one is the ability to block all plain TCP ports using firewall
rules to make sure that all connections are encrypted.

Hope this helps.

Kadir

On Mon, Mar 12, 2018 at 2:35 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> This is very exciting! With the way you've described it slotting in, I
> don't think there should be any problems getting the code merged
> upstream.
>
> But you will want to be aware of the work on "msgr2". I believe it's
> principally been Ricardo Dias working on that from a template we've
> discussed in the past. One of its goals is adding encryption, but that
> sits inside of the Ceph messaging protocol rather than on top of it
> and they should be able to coexist happily.
>
> Other comments inline.
>
> On Tue, Mar 6, 2018 at 6:12 PM, Kadir Ozdemir <kozdemir@xxxxxxxxxxxxxx> wrote:
>> Hi All,
>>
>>
>> <snip>
>>
>> There are two main objectives that shape the design to be described
>> here. The first one is to change the existing Ceph code and behavior
>> minimally. The second one is to support Simple and Async Messengers.
>> The main ideas behind the design are as follows:
>>
>> - Introduce a class called Socket to replace the socket descriptor in
>> the Pipe and AsyncConnection classes. Socket will wrap the socket
>> operations used currently in both Messengers. The socket descriptor
>> will be an attribute of Socket. Essentially, the system calls for
>> socket operations such as send and recv will be replaced by the
>> corresponding member functions in this class. This is to achieve the
>> minimal code change objective. Socket implements the plain TCP socket.
>>
>> - Introduce a class called TlsSocket to implement TLS specific
>> behavior common to both Simple and Async Messengers, such as
>> retrieving SSL Context, initiating TLS handshake, and reading from and
>> writing to the SSL object. It maintains two sets each of which
>> contains a lock, buffer, and BIO object; one set for the receive path
>> and the other for the send path. The locks serialize the access to
>> SSL, BIO, buffer and socket on the receive and send paths. The buffer
>> size is set to 16KB. TlsSocket inherits from the Socket class, that
>> is, it as a type of Socket. The existing Ceph code does not need to
>> distinguish if the socket instance is a plain TCP socket or TLS
>> enabled socket. TlsSocket allows us to separate the TLS specific code
>> from the rest.
>
> I'm not familiar with the SSL interface, but its general outline and
> this 16KB buffer size make me wonder: can we send small packets out
> easily, or does it require some minimum size? I'm thinking of the
> per-message protocol acknowledgements and the protocol heartbeat, for
> instance, which are only a few tens of bytes and need to be delivered
> at some reasonable latency.
>
>> - Introduce SimpleTlsSocket and AsyncTlsSocket classes to implement
>> the behavior specific to Simple and Async Messengers. These classes
>> inherit from TlsSocket. Simple Messenger uses blocking sockets while
>> Async Messenger uses non-blocking sockets. The differences between
>> them will be implemented within these classes. These classes are
>> responsible for interacting with the BIO objects and socket layer for
>> sending and receiving encrypted data.
>>
>> - Introduce a class called SslContext to be a wrapper for the SSL
>> Context object of the OpenSSL library. Each Messenger object can have
>> up to two SslContext objects, one for the client role and the other
>> for the server role. Ceph clients allocate only the client SslContext
>> objects since they only initiate connections. Ceph servers both
>> initiate and accept connections, and thus, allocate both client and
>> server SslContext objects.
>
> I'm not clear on why a server would want two SslContexts instead of
> one, but I trust the code will make this all clearer.
>
>>
>> - Introduce a class called TLS to represent OpenSSL library and be
>> responsible for initializing the library.
>>
>> New configuration parameters are defined to enable TLS. These are
>> “tls”, “tls_client_cert_file”, “tls_sever_cert_file”,
>> “tls_client_key_file”, “tls_server_key_file”,and “tls_ca_cert_file”.
>> The tls parameter can take one the three values : “none”, “desired”,
>> “required”. “none” means TLS is not enabled. The default value for
>> this parameter is “none”. For the older Ceph versions, it is
>> considered that “tls” is “none”.
>>
>> “desired” means if both ends of a TCP connection are configured with
>> “desired”, or “required”, the session between them must be a TLS
>> session. Otherwise, the session would be a plain TCP session.  The
>> "desired" value is used temporarily during rolling upgrade from plain
>> TCP sessions to TLS sessions. If one side is configured with
>> “required” and the other side is “none”, then Ceph connection attempts
>> between them will fail.
>>
>> The rolling upgrade from plain TCP sessions to TLS sessions can be
>> done as follows. After a Ceph client or server is upgraded to a TLS
>> supported version, the “tls” parameter is set to “desired”. For Ceph
>> clients, this parameter is read from the ceph.config fie, and for the
>> servers, the parameter is dynamically set without restarting the
>> server using the injectargs capability. When all the clients and
>> servers are configured with “tls = desired” then the servers and
>> clients can be configured with “tls = required”. When the “tls” value
>> is changed dynamically to “required”, existing connections initiated
>> or terminated from a server are dropped (and new connections where tls
>> is required are established). Client config files are updated with
>> “tls = required”, and clients can be restarted.
>
> Once the servers switch to "required", if the clients already have
> "desired" set, this should all be transparent to users, right?
>
>
>> We have considered two options for rolling upgrades. The first one
>> requires changing the Ceph protocol to advertise TLS configuration
>> during Ceph handshake. This allows accepting TLS sessions over the
>> existing Ceph ports that are used for plain TCP connections. In this
>> case, the connections are upgraded to the TLS sessions by starting TLS
>> handshake during or immediately after Ceph handshake.
>>
>> The second option is to use a separate set of port numbers for TLS.
>> This does not require changing the existing protocol since Ceph
>> handshake (i.e., the banner and the rest of) messages will be
>> exchanged only after TLS sessions are established. The clients
>> configured with the desired mode attempts to connect servers over the
>> TLS ports first. If it is not successful, then they attempt over the
>> plain TCP ports. The clients and servers configured with the required
>> mode just use the TLS ports. We appreciate the feedback on these
>> options. Our security team prefers the second option.
>
> Can you talk a little more about why your security team prefers that?
> Just because then you have a blanket promise that all data is
> encrypted? Or is there something specific about startup or envelope
> data they're worried about?
>
> I actually might be inclined to just force a cluster to run all-TLS if
> they want to set it up, rather than having the multiple messengers of
> enabling both. It *would* require more client-side configuration than
> having the TLS be part of the Ceph session setup, though, in addition
> to preventing access by the kernel client. Hrmmm...
> -Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html