> We should try comparing performance of multi-thread-epoll to > own-thread, shouldn't be hard to hack own-thread into non-SSL-socket > case. Own-thread has always been available on non-SSL sockets, from the day it was first implemented as part of HekaFS. > HOWEVER, if "own-thread" implies a thread per network connection, as > you scale out a Gluster volume with N bricks, you have O(N) clients, > and therefore you have O(N) threads on each glusterfsd (libgfapi > adoption would make it far worse)! Suppose we are implementing a > 64-brick configuration with 200 clients, not an unreasonably sized > Gluster volume for a scalable filesystem. We then have 200 threads > per Glusterfsd just listening for RPC messages on each brick. On a > 60-drive server there can be a lot more than 1 brick per server, so > multiply threads/glusterfsd by brick count! It doesn't make sense to > have total threads >= CPUs, and modern processors make context > switching between threads more and more expensive. It doesn't make sense to have total *busy* threads >= cores (not CPUs) because of context switches, but idle threads are very low-cost. Also, note that multi-threaded epoll is also not free from context-switch issues. The real problem with either approach is "whale" servers with large numbers of bricks apiece, vs. "piranha" servers with relatively few. That's an unbalanced system, with too little CPU and memory (and probably disk/network bandwidth) relative to capacity. That said, I've already conceded that there are probably cases where multi-threaded epoll will generate more parallelism than own-thread. However, that only matters up to the point where we hit some other bottleneck. The question is whether the difference is apparent *to the user* for any configuration and workload we can actually test. Only after we have that answer can we evaluate whether the benefit is greater than the risk (of uncovering even more race conditions in other components) and the drawback of being unable to support SSL. > Shyam mentioned a refinement to own-thread where we equally partition > the set of TCP connections among a pool of threads (own-thread is a > special case of this). Some form of this would dovetail very nicely with the idea of multiplexing multiple bricks onto a single glusterfsd process, which we need to do for other reasons. > On the Gluster server side, because of the io-threads translator, an > RPC listener thread is effectively just starting a worker thread and > then going back to read another RPC. With own-thread, although RPC > requests are received in order, there is no guarantee that the > requests will be processed in the order that they were received from > the network. On the client side, we have operations such as readdir > that will fan out parallel FOPS. If you use own-thread approach, then > these parallel FOP replies can all be processed in parallel by the > listener threads, so you get at least the same level of race condition > that you would get with multi-thread-epoll. You get some race conditions, but not to the same level. As you've already pointed out yourself, multi-threaded epoll can generate greater parallelism even among requests arriving on a single connection to a single volume. That is guaranteed to cause data-structure collisions that would be impossible otherwise. Also, let's not forget that either change is also applicable on the client side, in glusterd, in self-heal and rebalance, etc. Many of these have their own unique concerns with respect to concurrency and reentrancy, and don't already have io-threads. For example, I've had to fix several bugs in this area that were unique to glusterd. At least we've begun to shake out some of these issues with own-thread, though I'm sure there are still plenty of bugs still to be found. With multi-threaded epoll we're going to have even more issues in this area, and we've barely begun to discover them. That's not a fatal problem, but it's definitely a CON. > > * CON: multi-epoll does not work with SSL. It *can't* work with > > OpenSSL at all, short of adopting a hybrid model where SSL > > connections use own-thread while others use multi-epoll, which is a > > bit of a testing nightmare. > > Why is it a testing nightmare? It means having to test *both* sets of code paths, plus the code to hand off between them or use them concurrently, in every environment - not just those where we hand off to io-threads. > IMHO it's worth it to carefully trade off architectural purity Where does this "architectural purity" idea come from? This isn't about architectural purity. It's about code that's known to work vs. code that might perform better *in theory* but also presents some new issues we'd need to address. I don't like thread-per-connection. I've recommended against it many times. Whoever made the OpenSSL API so unfriendly to other concurrency approaches was a fool. Nonetheless, that's the way the real world is, and *in this particular context* I think own-thread has a better risk:reward ratio. > In summary, to back own-thread alternative I would need to see that a) > the own-thread approach is scalable, and that b) performance data > shows that own-thread is comparable to multi-thread-epoll in > performance. Gee, I wonder who we could get to run those tests. Maybe that would be better than mere conjecture (including mine). > Otherwise, in the absence of any other candidates, we have to go with > multi-thread-epoll. *Only* on the basis of performance, ignoring the other issues we've discussed? I disagree. If anything, there seem to be moves afoot to de-emphasize the traditional NAS-replacement role in favor of more archival/dispersed workloads. I don't necessarily agree with that, but it would make the "performance at any cost" argument even less relevant. P.S. I changed the subject line because I think it's inappropriate to make this about person vs. person, taking my side or the opposition's. There has been entirely too much divisive behavior on the list already. Let's try to focus on the arguments themselves, not who's making them. _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-devel