On 10/9/19 11:55 AM, Ludwig Krispenz wrote:
Hi William,
I like your radical approach :-)
In my opinion our connection code is getting to complicated by
maintaining two different implementations in parallel - not
separated, but intermangled (and even more complicated by turbo mode).
So I agree we should have only one, but which one ? In my opinion nunc
stans is the theoretically better approach, but nobody is confident
enough to rely on nunc stans alone. The conntable mode has its
problems (especially if handling many concurrent connections, and
worse if they are established almost at the same time)(otherwise we
would not have experimented with nunc stans), but is stable and for
most of the use cases efficient enough.
So reducing the complexity by removing nunc stans (and maybe also
turbo mode) and then do cleanup and try to improve the bottlenecks
would be an acceptable approach to me.
In my opinion the core of the problem of the "old" connection code is
that the main thread is handling new connections and already
established connections and so does iterate over the connection table.
Using an event model looks like the best way to handle this, but if it
doesn't work we need to look for other improvements without breaking
things.
Your suggestion to make the conn table data structure more lean and
flexible is one option. In sun ds, when I didn't know about event
queues I did split the main thread, one handling new connections and
multiple to handle established connections (parts of teh conn table) -
reusing the existing mechanisms, just splitting the load. Maybe we can
also think in this direction.
Handling new connections and established connections is a place where,
out of the box, nunc-stans gives a performance boost ( > +10%
throughput). But because it remains old part of the code (connection and
connection table) with which nunc-stans is not fully integrated, it
cannot show all its power (connection table) and also lead to some
design issues (deadlocks between nunc-stans job and connection, ability
to cancel nunc-stans job).
In addition support of nunc-stans is a concern. I found it very
difficult to investigate/debug especially when the problem only occurs
in customer deployment.
Also I find it difficult to maintain. For example, knowing RC of
deadlocks or others bugs, I found very difficult to say if a given patch
fixes or not a given RC.
sun DS approach is less innovative but gave good results at least to
handle high number of established connection.
I agree with Ludwig, this approach looks promising if we do need to
improve DS connection handling and decide to no longer invest in nunc-stans.
thierry
Regards,
Ludwig
On 10/09/2019 01:32 AM, William Brown wrote:
On 9 Oct 2019, at 09:18, Rich Megginson <rmeggins@xxxxxxxxxx> wrote:
On 10/8/19 4:55 PM, William Brown wrote:
Hi everyone,
In our previous catch up (about 4/5 weeks ago when I was visiting
Matus/Simon), we talked about nunc-stans and getting it at least
cleaned up and into the code base.
I've been looking at it again, and really thinking about it and
reflecting on it and I have a lot of questions and ideas now.
The main question is *why* do we want it merged?
Is it performance? Recently I provided a patch that yielded an
approximate ~30% speed up in the entire server through put just by
changing our existing connection code.
Is it features? What features are we wanting from this? We have no
complaints about our current threading model and thread allocations.
Is it maximum number of connections? We can always change the
conntable to a better datastructure that would help scale this
number higher (which would also yield a performance gain).
It is mostly about the c10k problem, trying to figure out a way to
use epoll, via an event framework like libevent, libev, or
libtevent, but in a multi-threaded way (at the time none of those
were really thread safe, or suitable for use in the way we do
multi-threading in 389).
It wasn't about performance, although I hoped that using lock-free
data structures might solve some of the performance issues around
thread contention, and perhaps using a "proper" event framework
might give us some performance boost e.g. the idle thread processing
using libevent timeouts. I think that using poll() is never going
to scale as well as epoll() in some cases e.g. lots of concurrent
connections, no matter what sort of datastructure you use for the
conntable.
The conntable was bottlenecking because when you had:
| conn | conn | freeslot | ....
it would attempt to lock each conn to see if it was free or not. This
meant if a conn was in an io, we would block waiting for it to finish
before we could move to the next conn to check if it was free. After
changing to pthread, we can now do trylock, where if trylock fails it
can be implied the conn slot must be in use, so skip it. This is how
we got the 30% speed up recently (my laptop went from about 4200
conn/sec to over 6000).
However the conntable exists to allow the conn struct to be re-used
over and over. There are multiple ways we could solve this. On conn
free, we could append to a freelist to prevent iterating over the
list to find a slot. We could use a btreemap and just alloc/dealloc
conns as needed to prevent the need to walk and find conns (this
would help sanitizers with memory too). We could bring in libevent
directly to the main server and have it replace poll too.
And even from then, I think we should be using flamegraphs to work
out where our time limits are too.
As far as features goes - it would be nice to give plugins the
ability to inject event requests, get timeout events, using the same
framework as the main server engine.
The more I have looked at the code, I guess with time and
experience, the more hesitant I am to actually commit to merging
it. It was designed by people who did not understand low-level
concurrency issues and memory architectures of systems,
I resemble that remark. I suppose you could "turn off" the
lock-free code and use mutexes.
It wasn't meant to be an insult btw - when I first looked I didn't
understand either, so I resemble this remark too. There are lots of
my own mistakes all over that code.
We have changed to mutexes since, but even with those the job model
and how they queue / dequeue and work along with libevent still has
issues because you need to protect the items in certain states
basically. Having a job self-rearm was a nightmare problem. We had to
add a monitor on each ns_job_t to allow it to work correctly too
because of the call back nature of the code.
When it was initially added there was a job per io event, but now
it's a job per connection and it lives the life of the connection.
This has lead to the current issue where there is a 1:1 of
slapi_conn_t and ns_job_t, and each has a lock and that led to
deadlocks ...
so it's had a huge number of (difficult and subtle) unsafety
issues. And while most of those are fixed, what it does is
duplicating the connection structure from core 389,
It was supposed to eventually replace the connection code.
There is too much in the slapi_conn_t structure that can't be removed
though, and too much locking of that struct through out the codebase
of it. It's probably better to try to update slapi_conn_t to have new
ideas than trying to replace it.
leading to weird solutions like lock sharing and having to use
monitors and more. We've tried a few times to push forward with
this, but each time we end up with a lot of complexity and fragility.
So I'm currently thinking a better idea is to step back,
re-evaluate what the problem is we are trying to solve for, then to
solve *that*.
The question now is "what is the concern that ns would solve". From
knowing that, then we can make a plan and approach it more
constructively I think.
I agree. There are probably better ways to solve the problems now.
Maybe even just smaller steps rather than one big-bang replacement :)
At the end of the day, I'm questioning if we should just rm -r
src/nunc-stans and rethink this whole approach - there are just too
many architectural flaws and limitations in ns that are causing us
headaches.
Ideas and thoughts?
--
Sincerely,
William
_______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to
389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/389-devel@xxxxxxxxxxxxxxxxxxxxxxx
_______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/389-devel@xxxxxxxxxxxxxxxxxxxxxxx
—
Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server
SUSE Labs
_______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/389-devel@xxxxxxxxxxxxxxxxxxxxxxx
_______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-devel@xxxxxxxxxxxxxxxxxxxxxxx