Re: disk i/o: very high write rates and poor search performance

William Brown <william@xxxxxxxxxxxxxxxx> · Thu, 16 Aug 2018 10:44:33 +1000

On Wed, 2018-08-15 at 11:03 -0600, Rich Megginson wrote:
> On 08/15/2018 10:56 AM, David Boreham wrote:
> > 
> > 
> > On 8/15/2018 10:36 AM, Rich Megginson wrote:
> > > 
> > > Updating the csn generator and the uuid generator will cause a
> > > lot of 
> > > churn in dse.ldif.  There are other housekeeping tasks which
> > > will 
> > > write dse.ldif
> > 
> > But if those things were being done so frequently that the
> > resulting 
> > filesystem I/O showed up on the radar as a potential system-wide 
> > performance issue, that would mean something was wrong somewhere,
> > right?
> 
> I would think so.  Then I suppose the first step would be to measure
> the 
> dse.ldif churn on a "normal" system to get a baseline.

We do have some poor locking strategies in some parts of the codebase
that sadly I just never had time to finish fixing. Access logging comes
to mind as a culprit for bottlenecking the server over locks needing a
complete rewrite .... However he says the access log is off. I'm not
sure that means that the locking over it is disabled though.

Other areas are the operation struct reuse (locks and has unbounded
growth with high operation numbers), the cn=config locking on many
variables (even for reads) which is frequently locked/unlocked (some
are at least atomics now, but they still cause stalls on cpu syncs),
locking in some plugins. These could all be parts of the issue.

I also had plans to add profiling into access logs so we could really
narrow down the time hogs in searches/writes, but given the current
access log design it's hard to do. We need more visibility into the
server state when these queries come in, and today we just don't have
it :( 

Finally, it could come down to simpler things like indexes or db
locking for certain queries ...

I think we need the access log enabled to get a better idea, with
highres etime enabled. I would not be surprised to see a pattern like :

100 fast searches
1 slow search
100 fast searches
1 slow search

That would indicate issues in the logging lock.

To me, this entire issue, indicates we need better profiling
information in our logs, because tools like strace just don't explain
what's going on. We need to know *why* and *where* threads are getting
stuck, how long each plugin is taking, and where the stalls are.
Without investment into our servers internal state, these issues will
always remain elusive to our users and us as a team. 

----- long description of current access log issues -----

The current access log is protected in memory by a single mutex, that
has an in memory buffer. Threads during their operations are writing to
(and contending) the mutex to this buffer. Normally this is "reasonably
fast", until the buffer fills. At this point the buffer needs to be
flushed. The next search thread when it encounters the buffer
approaching this limit *does the flushing itself* while holding the log
mutex.

At this point, one search is busy writing the whole access log buffer,
while every other thread begins to build up behind it waiting to write
into the buffer themself. At some point the (poor, unlucky) operation
thread that was writing to the buffer has been stuck doing disk IO for
any period of time, while everyone else waits. It may not have even
begun to send results to the client either! Finally, once done, it
unlocks, and can complete the operation. This is commonly what causes
the "burst" behaviour of the server.

Temporary fixes are to lower the buffer size (so it's written in
smaller, more frequent amounts), but really, the only solution today is
put the log in ramdisk/ssd to make that write "finish faster", or to
use the syslog interface with async enabled.

-- 
Sincerely,

William
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx/message/4AGDTUMTCT4XQOHKHMZ3ITXS3SNS6D2R/