[389-users] Re: high 'wtime' with 2.4.5

John Thurston via 389-users <389-users@xxxxxxxxxxxxxxxxxxxxxxx> · Thu, 5 Dec 2024 11:18:11 -0900



    Well, this is frustrating.
    Our trouble-ticket at Red Hat was updated late in October saying
      we could expect to see the correction in the "next update". RHDS
      12.5 shipped in late November (yay!) and I've been reviewing it.
      As far as I can tell, the fix for 389-ds-base Issue 6284 was
      incorporated into code in early August (in commit
      15a0b5b9e0c90fc5824752b8d58526ec8e13c256). 
    But RHDS 12.5 release notes say nothing about updates to the
      connection-handling. And it looks like 12.5 uses 389-ds-base
      2.5.2-2, which looks like it dates from July, 2024. So my read is
      RHDS 12.5 doesn't contain the fix for this problem, and we're
      going to have to wait for 12.6 (Q2 2025?) to see it . . . Or maybe
      12.7, as we need to see a new release of 389-ds before it makes
      its way into RHDS, so maybe Q4 2025.

    
    Can anyone tell me if I'm interpreting the situation and
      timelines correctly?
    Oh . . and our problematic query still behaves terribly when run
      against a RHDS 12.5 server :(

    
    --
Do things because you should, not just because you can. 

John Thurston    907-465-8591
John.Thurston@xxxxxxxxxx
Department of Administration
State of Alaska
    On 9/19/2024 12:25 AM, Thierry Bordaz
      wrote:

    
        Hi John,
        This is an excellent news that you created a CU case. We will
          investigate this regression hopefully soon.

          Thanks for the well described testcase, it  accelerate
          resolution. If you have others details regarding the testcase
          please update the case with it.
        Would you mind to give the case number or make sure that the
          support get in touch with us.
        Best regards

          thierry

        
        On 9/18/24 21:37, John Thurston
          wrote:

        
          Thank you for the pointer to the defect, Thierry. I
            appreciate the very quick, and informative response. It
            certainly smells like this is what is affecting us.
          Our case is a single connection, through which ~32,000
            sequential queries are passed. To work around this, we have
            re-created a DS11 replica, to which we have re-directed this
            job. On DS12, ~30 minutes are required. With DS11, the job
            completes in ~2 minutes.
            

          (Our DS12 instance is actually running RHDS, so we have
            opened a Red Hat support case with the details.)

          
          --
Do things because you should, not just because you can. 

John Thurston    907-465-8591
John.Thurston@xxxxxxxxxx
Department of Administration
State of Alaska
          On 9/12/2024 3:33 AM, Thierry
            Bordaz wrote:

          
              Hi Jon,
              Yes the description is "mostly" correct. We recently
                found a corner case [1], where large requests (requiring
                several poll/read) can get high wtime although there was
                no worker starvation.
              Would you provide sample of access log showing this
                issue ?

              
              [1] 
                  https://github.com/389ds/389-ds-base/issues/6284
              regards

                thierry

              
              On 9/12/24 01:29, John
                Thurston wrote:

              
                I have a new instance of 2.4.5, on which I'm seeing a
                  very high* 'wtime' in the access log.
                  

                From 
https://www.port389.org/docs/389ds/design/access-log-new-time-stats-design.html
                  I read
                
                  
                    wtime - This is the amount of
                      time the operation was waiting in the work queue
                      before being picked up by a worker thread.
                  
                
                Is this still an accurate description of 'wtime' ?
                If true, I suspect the high values I'm seeing have
                  nothing to do with the version of the software I'm
                  running, and everything to do with the system on which
                  the software is running. Work has arrived, and been
                  queued, but there aren't enough worker-threads to keep
                  the queue serviced in a timely manner.

                
                * 'high' as in 3,000% longer than what I see on a
                  totally different system running 1.4.4

                
                -- 
--
Do things because you should, not just because you can. 

John Thurston    907-465-8591
John.Thurston@xxxxxxxxxx
Department of Administration
State of Alaska
              
            
-- 
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue