Re: Future of nunc-stans

thierry bordaz <tbordaz@xxxxxxxxxx> · Thu, 10 Oct 2019 14:19:59 +0200

On 10/9/19 11:55 AM, Ludwig Krispenz wrote:
Hi William,

I like your radical approach :-)

In my opinion our connection code is getting to complicated by 
maintaining two different implementations in parallel -  not 
separated, but intermangled (and even more complicated by turbo mode). 
So I agree we should have only one, but which one ? In my opinion nunc 
stans is the theoretically better approach, but nobody is confident 
enough to rely on nunc stans alone. The conntable mode has its 
problems (especially if handling many concurrent connections, and 
worse if they are established almost at the same time)(otherwise we 
would not have experimented with nunc stans), but is stable and for 
most of the use cases efficient enough.

So reducing the complexity by removing nunc stans (and maybe also 
turbo mode) and then do cleanup and try to improve the bottlenecks 
would be an acceptable approach to me.

In my opinion the core of the problem of the "old" connection code is 
that the main thread is handling new connections and already 
established connections and so does iterate over the connection table. 
Using an event model looks like the best way to handle this, but if it 
doesn't work we need to look for other improvements without breaking 
things.
Your suggestion to make the conn table data structure more lean and 
flexible is one option. In sun ds, when I didn't know about event 
queues I did split the main thread, one handling new connections and 
multiple to handle established connections (parts of teh conn table) - 
reusing the existing mechanisms, just splitting the load. Maybe we can 
also think in this direction.

Handling new connections and established connections is a place where, 
out of the box, nunc-stans gives a performance boost ( > +10% 
throughput). But because it remains old part of the code (connection and 
connection table) with which nunc-stans is not fully integrated, it 
cannot show all its power (connection table) and also lead to some 
design issues (deadlocks between nunc-stans job and connection, ability 
to cancel nunc-stans job).
In addition support of nunc-stans is a concern. I found it very 
difficult to investigate/debug especially when the problem only occurs 
in customer deployment.
Also I find it difficult to maintain. For example, knowing RC of 
deadlocks or others bugs, I found very difficult to say if a given patch 
fixes or not a given RC.

sun DS approach is less innovative but gave good results at least to 
handle high number of established connection.

I agree with Ludwig, this approach looks promising if we do need to 
improve DS connection handling and decide to no longer invest in nunc-stans.

thierry
Regards,
Ludwig

On 10/09/2019 01:32 AM, William Brown wrote:

On 9 Oct 2019, at 09:18, Rich Megginson <rmeggins@xxxxxxxxxx> wrote:

On 10/8/19 4:55 PM, William Brown wrote:
Hi everyone,
In our previous catch up (about 4/5 weeks ago when I was visiting 
Matus/Simon), we talked about nunc-stans and getting it at least 
cleaned up and into the code base.
I've been looking at it again, and really thinking about it and 
reflecting on it and I have a lot of questions and ideas now.
The main question is *why* do we want it merged?
Is it performance? Recently I provided a patch that yielded an 
approximate ~30% speed up in the entire server through put just by 
changing our existing connection code.
Is it features? What features are we wanting from this? We have no 
complaints about our current threading model and thread allocations.
Is it maximum number of connections? We can always change the 
conntable to a better datastructure that would help scale this 
number higher (which would also yield a performance gain).
It is mostly about the c10k problem, trying to figure out a way to 
use epoll, via an event framework like libevent, libev, or 
libtevent, but in a multi-threaded way (at the time none of those 
were really thread safe, or suitable for use in the way we do 
multi-threading in 389).

It wasn't about performance, although I hoped that using lock-free 
data structures might solve some of the performance issues around 
thread contention, and perhaps using a "proper" event framework 
might give us some performance boost e.g. the idle thread processing 
using libevent timeouts.  I think that using poll() is never going 
to scale as well as epoll() in some cases e.g. lots of concurrent 
connections, no matter what sort of datastructure you use for the 
conntable.
The conntable was bottlenecking because when you had:

| conn | conn | freeslot | ....

it would attempt to lock each conn to see if it was free or not. This 
meant if a conn was in an io, we would block waiting for it to finish 
before we could move to the next conn to check if it was free. After 
changing to pthread, we can now do trylock, where if trylock fails it 
can be implied the conn slot must be in use, so skip it. This is how 
we got the 30% speed up recently (my laptop went from about 4200 
conn/sec to over 6000).

However the conntable exists to allow the conn struct to be re-used 
over and over. There are multiple ways we could solve this. On conn 
free, we could append to a freelist to prevent iterating over the 
list to find a slot. We could use a btreemap and just alloc/dealloc 
conns as needed to prevent the need to walk and find conns (this 
would help sanitizers with memory too). We could bring in libevent 
directly to the main server and have it replace poll too.

And even from then, I think we should be using flamegraphs to work 
out where our time limits are too.

As far as features goes - it would be nice to give plugins the 
ability to inject event requests, get timeout events, using the same 
framework as the main server engine.

The more I have looked at the code, I guess with time and 
experience, the more hesitant I am to actually commit to merging 
it. It was designed by people who did not understand low-level 
concurrency issues and memory architectures of systems,
I resemble that remark.  I suppose you could "turn off" the 
lock-free code and use mutexes.
It wasn't meant to be an insult btw - when I first looked I didn't 
understand either, so I resemble this remark too. There are lots of 
my own mistakes all over that code.

We have changed to mutexes since, but even with those the job model 
and how they queue / dequeue and work along with libevent still has 
issues because you need to protect the items in certain states 
basically. Having a job self-rearm was a nightmare problem. We had to 
add a monitor on each ns_job_t to allow it to work correctly too 
because of the call back nature of the code.

When it was initially added there was a job per io event, but now 
it's a job per connection and it lives the life of the connection. 
This has lead to the current issue where there is a 1:1 of 
slapi_conn_t and ns_job_t, and each has a lock and that led to 
deadlocks ...

so it's had a huge number of (difficult and subtle) unsafety 
issues. And while most of those are fixed, what it does is 
duplicating the connection structure from core 389,
It was supposed to eventually replace the connection code.
There is too much in the slapi_conn_t structure that can't be removed 
though, and too much locking of that struct through out the codebase 
of it. It's probably better to try to update slapi_conn_t to have new 
ideas than trying to replace it.

leading to weird solutions like lock sharing and having to use 
monitors and more. We've tried a few times to push forward with 
this, but each time we end up with a lot of complexity and fragility.
So I'm currently thinking a better idea is to step back, 
re-evaluate what the problem is we are trying to solve for, then to 
solve *that*.
The question now is "what is the concern that ns would solve". From 
knowing that, then we can make a plan and approach it more 
constructively I think.
I agree.  There are probably better ways to solve the problems now.
Maybe even just smaller steps rather than one big-bang replacement :)

At the end of the day, I'm questioning if we should just rm -r 
src/nunc-stans and rethink this whole approach - there are just too 
many architectural flaws and limitations in ns that are causing us 
headaches.
Ideas and thoughts?
--
Sincerely,
William
_______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 
389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: 
https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-devel@xxxxxxxxxxxxxxxxxxxxxxx
_______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-devel@xxxxxxxxxxxxxxxxxxxxxxx
—
Sincerely,

William Brown

Senior Software Engineer, 389 Directory Server
SUSE Labs
_______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-devel@xxxxxxxxxxxxxxxxxxxxxxx

_______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-devel@xxxxxxxxxxxxxxxxxxxxxxx