[389-users] Re: Inconsistent Ldap connection issues

Gary Waters via 389-users <389-users@xxxxxxxxxxxxxxxxxxxxxxx> · Wed, 16 Oct 2024 09:44:09 -0700



      Are there any routers, middlewares,
        firewalls, idp's etc between the client/ldap server? Load
        balancer? 
      When this first started happening, the client a cluster of
      containers just spoke to ldap server directly over a peering
      connection. Since the error was unable to connect to ldap, I
      thought perhaps the one ldap server could not handle it. So I
      added a load balancer (AWS NLB) and a second ldap server. It didnt
      help.  Since this was happening before the load balancer, I dont
      think its that. There is a ALB in front of the cluster.
    -Gary
    
    On 10/16/24 09:24, Gary Waters wrote:

    
        Are there any routers, middlewares, firewalls, idp's etc
          between the client/ldap server? Load balancer? 
      
      When this first started happening, the client a cluster of
        containers just spoke to ldap server directly over a peering
        connection. Since the error was unable to connect to ldap, I
        thought perhaps the one ldap server could not handle it. So I
        added a load balancer (AWS NLB) and a second ldap server. It
        didnt help.  Since this was happening before the load balancer,
        I dont think its that. There is a ALB in front of the cluster.
      -Gary

      
      On 10/15/24 17:26, William Brown
        wrote:

      
                These errors are only shown on
                  the client, yes? Is there any evidence of a failed
                  connection in the access log? 
                Correct, those 2 different contacting ldap error issues.
                I have searched for various things in the logs, but I
                havent read it line by line. I dont see "err=1", no fd
                errors, or "Not listening for new connections - too many
                fds open". 

              
          So, that means the error is happening *before* 389-ds
            gets a chance to accept on the connection. 
          

                We encountered a similar issue
                  recently with another load test, where the load tester
                  wasn't averaging it's connections, it would launch
                  10,000 connections at once and hope they all worked.
                  With your load test, is it actually spreading it's
                  connections out, or is it bursting?
                It's a ramp up of 500 users logging in and starting
                their searches, the initial ramp up is 60 seconds, but
                the searches and login/logouts is over 6 minutes.  I
                just spliced up the logs to see what that first minute
                was like:
                 Peak Concurrent Connections:   689

                  Total Operations:              18770

                  Total Results:                 18769

                  Overall Performance:           100.0%

                  
                  Total Connections:             2603         
                  (21.66/sec)  (1299.40/min)

                   - LDAP Connections:           2603         
                  (21.66/sec)  (1299.40/min)

                   - LDAPI Connections:          0            
                  (0.00/sec)  (0.00/min)

                   - LDAPS Connections:          0            
                  (0.00/sec)  (0.00/min)

                   - StartTLS Extended Ops:      2571         
                  (21.39/sec)  (1283.42/min)

                  
                  Searches:                      13596        
                  (113.12/sec)  (6787.01/min)

                  Modifications:                 0            
                  (0.00/sec)  (0.00/min)

                  Adds:                          0            
                  (0.00/sec)  (0.00/min)

                  Deletes:                       0            
                  (0.00/sec)  (0.00/min)

                  Mod RDNs:                      0            
                  (0.00/sec)  (0.00/min)

                  Compares:                      0            
                  (0.00/sec)  (0.00/min)

                  Binds:                         2603         
                  (21.66/sec)  (1299.40/min)

                  
                With these settings below, the test results are in,
                  they still get 1 ldap error per test.

                
                net.ipv4.tcp_max_syn_backlog = 8192

                
                net.core.somaxconn = 8192
                Suggestions ? Should I bump these up more ? 

                
          We still don't know what the cause *is* so just tweaking
          values won't help. We need to know what layer is triggering
          the error before we make changes. 
        

        Reading these numbers, this doesn't look like the server
          should be under any stress at all - I have tested with 2cpu /
          4gb ram and can easily get 10,000 simultaneous connections
          launched and accepted by 389-ds.  
        

        My thinking at this point is there is something in between
          the client and 389 that is not coping. 
        

              -- 

                Sincerely,

                
                William Brown

                
                Senior Software Engineer,

                Identity and Access Management

                SUSE Labs, Australia
            
          
-- 
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue