Re: Future of nunc-stans

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On 22 Oct 2019, at 07:28, Mark Reynolds <mreynolds@xxxxxxxxxx> wrote:
> 
> 
> On 10/9/19 10:00 PM, William Brown wrote:
>>> On 9 Oct 2019, at 19:55, Ludwig Krispenz <lkrispen@xxxxxxxxxx> wrote:
>>> 
>>> Hi William,
>>> 
>>> I like your radical approach :-)
>>> 
>>> In my opinion our connection code is getting to complicated by maintaining two different implementations in parallel -  not separated, but intermangled (and even more complicated by turbo mode). So I agree we should have only one, but which one ? In my opinion nunc stans is the theoretically better approach, but nobody is confident enough to rely on nunc stans alone. The conntable mode has its problems (especially if handling many concurrent connections, and worse if they are established almost at the same time)(otherwise we would not have experimented with nunc stans), but is stable and for most of the use cases efficient enough.
>> I think you nailed it in one - we aren't confident in nunc-stans today, so let's keep what works and improve that. There are already many similar concepts - work queues, threads, even slapi_conn_t. I think that it would be possible to bring "nunc-stans ideas" into reworking and improvement to the current connection code instead.
>> 
>>> So reducing the complexity by removing nunc stans (and maybe also turbo mode) and then do cleanup and try to improve the bottlenecks would be an acceptable approach to me.
>> Agree. It also means we can make much smaller changes in an easier to control and test fashion I think.
>> 
>>> In my opinion the core of the problem of the "old" connection code is that the main thread is handling new connections and already established connections and so does iterate over the connection table. Using an event model looks like the best way to handle this, but if it doesn't work we need to look for other improvements without breaking things.
>>> Your suggestion to make the conn table data structure more lean and flexible is one option. In sun ds, when I didn't know about event queues I did split the main thread, one handling new connections and multiple to handle established connections (parts of teh conn table) - reusing the existing mechanisms, just splitting the load. Maybe we can also think in this direction.
>> I think so too. We can certainly have some ideas about what actually does the polling vs what does accepting, or better, event management etc. There are some ideas to have smaller groups of workers too to improve thread locality and help improve concurrency too.
>> 
>> So maybe I'll put together a patch to remove nunc-stans soon then, and start to look at the existing connection code and options to improve that + some profiling.
>> 
>> I still would like to hear about my original question though as quoted below, I think Mark might have some comments :)
> 
> Why nunc-stans?  Because at the time we were terrible at handling many connections, simultaneously and in large numbers.  C10K, etc.  Our competitors performed much better in these scenarios than we did.  I recall several customer cases complaining about "our performance verses theirs" in regards to large numbers of connection.
> 
> So, moving forward, I agree with everyone here.  We should remove nunc-stans, but as William said try to incorporate some of its concepts into our connection code (eventually).  We should clean up the connection code as much as possible, and remove turbo mode if it does not provide much value.

I think turbo mode was to try and shortcut returning to the conntable and then having the blocking on the connections poll because the locking strategies before weren't as good. I think there is still some value in turbo "for now" but if we can bring in libevent, then it diminishes because we become event driven rather than poll driven. 

>  The one thing that has been frustrating is how the connection code had become very complicated and most of us no longer know how it works anymore.  It would be nice to get it to a state that is much more maintainable (especially for engineers new to the code).

Given how much I've looked at it recently, I'm probably the closest to having an understanding of that code, but I certainly won't claim 100% expertise here. 

> 
> I think we should look at improving the connection table as previously suggested, and we should add the additional polling thread that Ludwig mentioned.  We know we will get added performance with just these two changes.  Then we should stress and evaluate the new behavior, and if need be we can look into more invasive/architectural changes and use some of the ideology from nunc-stans.  (This also means we need a way to properly and consistently test high connection load from multiple clients).

I think there are a few things we can do beside this. I think we could also split the thread pool into multiple worker pools, each with a polling thread. IE instead of say 32 threads in one pool, have two pools of 16 threads each, and then just load-balance new connections into each. This will help with cache locality and more. A quick win in the short term is a freelist of available conntable slots so we aren't walking the table on allocation too.

I'm reviving my old lib389 load testing code so I can make some load tests that will spit out csvs for us to check between runs so that we have some better profiling data too. 


Anyway, I think we're all in agreement here, so we have a plan. Time for me to do some work then .... 

> 
> Mark
> 
>> 
>>>> The main question is *why* do we want it merged?
>>>> Is it performance? Recently I provided a patch that yielded an approximate ~30% speed up in the entire server through put just by changing our existing connection code.
>>>> Is it features? What features are we wanting from this? We have no complaints about our current threading model and thread allocations.
>>>> Is it maximum number of connections? We can always change the conntable to a better datastructure that would help scale this number higher (which would also yield a performance gain).
>> —
>> Sincerely,
>> 
>> William Brown
>> 
>> Senior Software Engineer, 389 Directory Server
>> SUSE Labs
>> _______________________________________________
>> 389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
>> To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
>> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>> List Archives: https://lists.fedoraproject.org/archives/list/389-devel@xxxxxxxxxxxxxxxxxxxxxxx
> 
> -- 
> 
> 389 Directory Server Development Team

—
Sincerely,

William Brown

Senior Software Engineer, 389 Directory Server
SUSE Labs
_______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-devel@xxxxxxxxxxxxxxxxxxxxxxx




[Index of Archives]     [Fedora Directory Announce]     [Fedora Users]     [Older Fedora Users Mail]     [Fedora Advisory Board]     [Fedora Security]     [Fedora Devel Java]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Mentors]     [Fedora Package Review]     [Fedora Art]     [Fedora Music]     [Fedora Packaging]     [CentOS]     [Fedora SELinux]     [Big List of Linux Books]     [KDE Users]     [Fedora Art]     [Fedora Docs]

  Powered by Linux