Thank you for your interest. We are in the process to make the code available. I will announce it here when it is available Regards, Kadir On Wed, Mar 7, 2018 at 6:39 AM, Junwang Zhao <zhjwpku@xxxxxxxxx> wrote: > Hi Kadir, > > I think this is something that I once wanted to implement during my > GSoC project: On-the-wire encryption, but I am not capable of finishing > that, so I am really curious to see your code, can you share a branch? > > Regards, > Zhao > > On Wed, Mar 7, 2018 at 10:12 AM, Kadir Ozdemir <kozdemir@xxxxxxxxxxxxxx> wrote: >> Hi All, >> >> >> We cannot deploy Ceph in our data centers without encrypting its >> network traffic. We have decided to implement SSL/TLS within the >> Messenger layer using the OpenSSL library on our jewel branch. >> Although our implementation has not been completed, we do not see any >> obstacles to complete and deploy it. I would like to give some >> overview, sketch the design in the rest of this email, and get your >> feedback. If the community is interested in merging this effort to the >> upstream, we will be very happy to collaborate and contribute. >> >> Two main structures of the OpenSSL library are SSL Context and SSL. An >> SSL Context object holds certificates, a private key, and options >> regarding the TLS protocol and algorithms. An SSL Context object is >> used to provide the context to create SSL objects which represent SSL >> sessions. SSL objects are responsible for encryption, and session >> handshake and renegotiation among other things. >> >> Ceph Messengers have implemented somewhat sophisticated control logic >> over the socket layer, and require a different programing model than >> the simple way of using OpenSSL. OpenSSL has a rich set of APIs at >> different abstraction levels. In order to decouple the TLS operations >> from the underlying I/O layer (that is, in our case the socket layer), >> OpenSSL provides an abstraction for handling input and output at the >> encrypted side of TLS, which is referred as BIO (Basic Input Output). >> We will be using a pair of (i.e., input and output) memory BIOs for >> buffering encrypted data to be received from or sent to the socket >> layer. BIOs allow us to preserve the implementation of the existing >> socket handling logic within the Ceph networking layer. >> >> When TLS is enabled on top of a TCP connection, TLS inserts handshake >> and renegotiation messages to the TCP stream and encapsulates the >> plaintext stream of bytes transferred by an application into TLS >> records. An application in our context is a Ceph Messenger. The TLS >> handshake happens at the beginning of a TLS session as part of >> establishing the session. Before any application data can be sent or >> received over a TLS session, the TLS handshake has to be completed. >> During handshake, the end points of the session are verified using PKI >> certificates, and encryption key is established among other things. >> >> Assuming that TLS handshake is completed (after TCP connection is >> established), now we can trace the send path. The send path starts >> with calling SSL_write() which takes the handle for the SSL object, >> the address of a data buffer, and the length of the buffer. The SSL >> object encrypts the data into one or more TLS records using its >> internal buffer and then writes the TLS records to the output BIO. >> When SSL_write() succeeds, we know that the data is encrypted and >> copied to the internal buffer of the output BIO. The next step is to >> read the encrypted data from the output BIO, and write it to the >> socket. In order to do this operation, a separate buffer needs to be >> used to copy data from the output BIO to the socket. >> >> TLS can be in the middle of renegotiating while the application >> attempts to receive or send data. Please note that the SSL object can >> only write to or read from a BIO in the context of the caller (i.e., >> the calling thread). This is one of the reasons, it fails SSL_write or >> SSL_read calls to inform the caller that it wants to read or write. In >> this case, the (SSL_write or SSL_read) operation needs to be repeated >> after the necessary action is taken. If the SSL object needs to read, >> then the application needs to read more encrypted data from the >> socket, and write it to the input BIO. If the SSL object needs to >> write, then the data from the output BIO needs to be read and written >> to the socket. >> >> The read path is a bit more complicated as the SSL object may have >> some remaining plaintext from the earlier SSL_read operation. So, the >> application needs to attempt to read from the SSL object first using >> SSL_read(). If there is no leftover data, then SSL_read() will return >> zero bytes. Then, the application needs to check why the read failed >> using SSL_get_error() as in the case of SSL_write() failure. If the >> SSL layer wants to read as expected when there is no data in the SSL >> object, then the application needs to receive more encrypted data from >> the socket, push this data to the input BIO using BIO_write(), and >> attempt to read the plaintext using the SSL_read() operation on the >> SSL object. >> >> There are two main objectives that shape the design to be described >> here. The first one is to change the existing Ceph code and behavior >> minimally. The second one is to support Simple and Async Messengers. >> The main ideas behind the design are as follows: >> >> - Introduce a class called Socket to replace the socket descriptor in >> the Pipe and AsyncConnection classes. Socket will wrap the socket >> operations used currently in both Messengers. The socket descriptor >> will be an attribute of Socket. Essentially, the system calls for >> socket operations such as send and recv will be replaced by the >> corresponding member functions in this class. This is to achieve the >> minimal code change objective. Socket implements the plain TCP socket. >> >> - Introduce a class called TlsSocket to implement TLS specific >> behavior common to both Simple and Async Messengers, such as >> retrieving SSL Context, initiating TLS handshake, and reading from and >> writing to the SSL object. It maintains two sets each of which >> contains a lock, buffer, and BIO object; one set for the receive path >> and the other for the send path. The locks serialize the access to >> SSL, BIO, buffer and socket on the receive and send paths. The buffer >> size is set to 16KB. TlsSocket inherits from the Socket class, that >> is, it as a type of Socket. The existing Ceph code does not need to >> distinguish if the socket instance is a plain TCP socket or TLS >> enabled socket. TlsSocket allows us to separate the TLS specific code >> from the rest. >> >> - Introduce SimpleTlsSocket and AsyncTlsSocket classes to implement >> the behavior specific to Simple and Async Messengers. These classes >> inherit from TlsSocket. Simple Messenger uses blocking sockets while >> Async Messenger uses non-blocking sockets. The differences between >> them will be implemented within these classes. These classes are >> responsible for interacting with the BIO objects and socket layer for >> sending and receiving encrypted data. >> >> - Introduce a class called SslContext to be a wrapper for the SSL >> Context object of the OpenSSL library. Each Messenger object can have >> up to two SslContext objects, one for the client role and the other >> for the server role. Ceph clients allocate only the client SslContext >> objects since they only initiate connections. Ceph servers both >> initiate and accept connections, and thus, allocate both client and >> server SslContext objects. >> >> - Introduce a class called TLS to represent OpenSSL library and be >> responsible for initializing the library. >> >> New configuration parameters are defined to enable TLS. These are >> “tls”, “tls_client_cert_file”, “tls_sever_cert_file”, >> “tls_client_key_file”, “tls_server_key_file”,and “tls_ca_cert_file”. >> The tls parameter can take one the three values : “none”, “desired”, >> “required”. “none” means TLS is not enabled. The default value for >> this parameter is “none”. For the older Ceph versions, it is >> considered that “tls” is “none”. >> >> “desired” means if both ends of a TCP connection are configured with >> “desired”, or “required”, the session between them must be a TLS >> session. Otherwise, the session would be a plain TCP session. The >> "desired" value is used temporarily during rolling upgrade from plain >> TCP sessions to TLS sessions. If one side is configured with >> “required” and the other side is “none”, then Ceph connection attempts >> between them will fail. >> >> The rolling upgrade from plain TCP sessions to TLS sessions can be >> done as follows. After a Ceph client or server is upgraded to a TLS >> supported version, the “tls” parameter is set to “desired”. For Ceph >> clients, this parameter is read from the ceph.config fie, and for the >> servers, the parameter is dynamically set without restarting the >> server using the injectargs capability. When all the clients and >> servers are configured with “tls = desired” then the servers and >> clients can be configured with “tls = required”. When the “tls” value >> is changed dynamically to “required”, existing connections initiated >> or terminated from a server are dropped (and new connections where tls >> is required are established). Client config files are updated with >> “tls = required”, and clients can be restarted. >> >> We have considered two options for rolling upgrades. The first one >> requires changing the Ceph protocol to advertise TLS configuration >> during Ceph handshake. This allows accepting TLS sessions over the >> existing Ceph ports that are used for plain TCP connections. In this >> case, the connections are upgraded to the TLS sessions by starting TLS >> handshake during or immediately after Ceph handshake. >> >> The second option is to use a separate set of port numbers for TLS. >> This does not require changing the existing protocol since Ceph >> handshake (i.e., the banner and the rest of) messages will be >> exchanged only after TLS sessions are established. The clients >> configured with the desired mode attempts to connect servers over the >> TLS ports first. If it is not successful, then they attempt over the >> plain TCP ports. The clients and servers configured with the required >> mode just use the TLS ports. We appreciate the feedback on these >> options. Our security team prefers the second option. >> >> The other parameters, the parameters for certificate and key files, >> hold the locations (i.e., paths) of the corresponding files. >> >> The existing Ceph authentication protocol works as before over TLS >> since this design is compatible with it. The design does NOT support >> the kernel rbd module (krbd). We are planning to use librbd (via the >> user space rbd-nbd client) for the block use cases. The overhead of >> rbd-nbd is mostly around 10 to 15% based on the fio runs on Centos 7. >> Our initial performance runs show that TLS overhead should also be >> about 10 to 15% based on the rados bench throughput tests. More >> performance characterization needs to be done to get more reliable >> results. >> >> Thanks, >> Kadir >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html