On Thu, Apr 21, 2016 at 5:37 PM, Jeff Darcy <jdarcy@xxxxxxxxxx> wrote: >> I've recently become aware of another problem with own-threads. The >> threads launched are not reaped, pthread_joined, after a TLS >> connection disconnects. This is especially problematic with GlusterD >> as it launches a lot of threads to handle generally short lived >> connections (volfile fetch, portmapper). This causes GlusterDs mem >> usage to continually grow, and finally lead to other failures due to >> memory shortage. I've recently seen a setup with GlusterD memory >> usage in 10s of GBs of reserved mem and TBs of virt mem. This is >> easily reproducible as well. I'm still working out a solution for >> this. >> >> While allowing TLS connections with own-threads only will lead to a >> more stable experience, this is a really bad in terms of our memory >> consumption. This will badly affect our chances of having 1000s of >> clients. Making TLS work with epoll would fix this, but I'm not very >> sure of the effort involved. Could we fix this for 3.8? For 4.0, if >> we want to default to TLS, we definitely need to fix this. > > Maybe it's just my not-so-humble opinion, but reaping threads seems > like a pretty easy thing to implement. By contrast, the prospects It is easier to get the threads reaped, and this is what I intend to do for the next 3.7.x release and 3.8. The simplest solution I can think of right now is to have a reaper timer run periodically, which would reap any TLS own-threads that have stopped. The process of reaping would be as follows, - The reaper timer is started when the 1st TLS own-thread is created. It wakes up every X seconds and reaps dead threads. - TLS own-threads need to notify the reaper timer of their demise. This is achieved by pushing their thread-ids to a global queue when they exit. - When the reaper timer is triggered, it reads in the thread-ids from the queue and calls pthread_join on them. This should work well. But I'm not sure if this is the simplest way to do the reaping. What do you think of this? > of making TLS (specifically OpenSSL) work reliably with epoll seem > murky at best. Nothing has been easy with epoll so far, and I don't > see why we'd expect making it work reliably with OpenSSL's horrible > API would be the first exception. Fixing one small issue with > own-thread still seems like the quickest route to a stable TLS > implementation. While TLS will get more robust by fixing the problems with own-thread, I'm still concerned with the memory usage for Gluster-4.0. Particularly because we're aiming to use TLS by default and have brick multiplexing. This could lead to situations with a single process launching 1000s of threads to handle TLS connections, which will lead to large memory footprint for Gluster. This is my reasoning for trying to get TLS work with epoll. I may be overthinking this, and this might not be of any significance at all. _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel