Re: Error code 51 and replication errors

Rich Megginson <rmeggins@xxxxxxxxxx> · Wed, 22 Oct 2014 11:46:08 -0600



    On 10/22/2014 11:35 AM, Shilen Patel
      wrote:

    
      Thanks for the information.  I’m
          actually running 6.5 not 6.6.  The latest version I’m seeing for
        6.5 is 1.2.11.15-34.el6_5.  Is that version for 6.5 about the
        same (in terms of bug fixes) as 1.2.11.15-47 in 6.6?
    
    
    Is 1.2.11.15-34.el6_5 the same as 1.2.11.15-47?  No.  -47 has a lot
    more bug fixes.

    
      If so, I’ll check out 1.2.11.15-34 in 6.5.  Otherwise, I’ll
        upgrade to 6.6 first.  Appreciate the help.
      

      Thanks!
      

      — Shilen
      
        
          From: Rich Megginson
          <rmeggins@xxxxxxxxxx>

          Reply-To: "389-users@xxxxxxxxxxxxxxxxxxxxxxx"
          <389-users@xxxxxxxxxxxxxxxxxxxxxxx>

          Date: Wednesday,
          October 22, 2014 at 1:10 PM

          To: "389-users@xxxxxxxxxxxxxxxxxxxxxxx"
          <389-users@xxxxxxxxxxxxxxxxxxxxxxx>

          Subject: Re:
           Error code 51 and replication errors

        
              On 10/22/2014 10:58 AM,
                Shilen Patel wrote:

              
                1.2.11.15 is a couple of years old?
              
              
              Yes and no.  1.2.11.15 was the starting point for EL6. 
              However, many, many features and fixes have been
              backported from later versions into 1.2.11.15-47 in EL
              6.6.

              
                I had to upgrade to the latest in copr because of
                  another issue that I think was fixed in 1.2.11.30.
              
              
              Has that issue been fixed in 1.2.11.15-47 in EL 6.6?  I
              know a lot of 389 community members running on EL6 were
              using fedorapeople/copr repos because they could not wait
              until those fixes/features were available in EL 6.6.  Now
              that EL 6.6 is out, I encourage you (and anyone else in
              this situation) to stop using fedorapeople/copr builds and
              instead use 1.2.11.15-47 in EL 6.6.

              
                If I’m misunderstanding version numbers in EL vs
                  copr, please let me know.
              
              
              See above.

              
                But my main question is the second question
                  regarding best practices for detecting replication
                  failures and I think that applies to all versions?
              
              
              nsds5replicaLastUpdateStatus
                  is the documented way to get replication status.  The
                  fact that this error is not being reported that way
                  seems like a bug.

                  You can also monitor the errors logs.

                  
                  As for this particular problem, see 
                    https://fedorahosted.org/389/ticket/47409

                  
                Thanks!
                

                — Shilen
                

                    From: Rich
                    Megginson <rmeggins@xxxxxxxxxx>

                    Reply-To: "389-users@xxxxxxxxxxxxxxxxxxxxxxx"
                    <389-users@xxxxxxxxxxxxxxxxxxxxxxx>

                    Date: Wednesday,
                    October 22, 2014 at 12:14 PM

                    To: "389-users@xxxxxxxxxxxxxxxxxxxxxxx"
                    <389-users@xxxxxxxxxxxxxxxxxxxxxxx>

                    Subject: Re:
                     Error code 51 and replication errors

                  
                        On 10/22/2014 10:10
                          AM, Shilen Patel wrote:

                        
                            389-ds-base-1.2.11.32-1.el6.x86_64
                          
                        
                        I would strongly encourage you to use the
                        version provided with EL 6.6, which is
                        389-ds-base-1.2.11.15-47.  It looks like you are
                        using a build from the old rmeggins repo or the
                        newer copr repo.  These are really only for
                        those users who needed critical fixes or
                        features not yet in the "supported" EL6.6
                        version.  I don't know if that will fix your
                        problem, but it will make it a lot easier to
                        support.

                        
                          Thanks!
                          
                            
                            — Shilen
                          
                            
                              From: Rich
                              Megginson <rmeggins@xxxxxxxxxx>

                              Reply-To: "389-users@xxxxxxxxxxxxxxxxxxxxxxx"
                              <389-users@xxxxxxxxxxxxxxxxxxxxxxx>

                              Date: Wednesday,
                              October 22, 2014 at 12:07 PM

                              To: "389-users@xxxxxxxxxxxxxxxxxxxxxxx"
                              <389-users@xxxxxxxxxxxxxxxxxxxxxxx>

                              Subject: Re:
                               Error code 51 and replication
                              errors

                            
                                  On
                                    10/22/2014 09:54 AM, Shilen Patel
                                    wrote:

                                  
                                    Hi,
                                    

                                    I’m running 1.2.11.32.
                                  
                                  
                                  What is output of rpm -q 389-ds-base?

                                  
                                    I have 6 replicas (two of which
                                      are read-only).  I ran into an
                                      issue where a DELETE operation
                                      failed on a server with error code
                                      51 (ldap busy).
                                    

                                      [21/Oct/2014:23:44:44
                                        -0400] conn=78160 op=39510
                                        RESULT err=51 tag=107 nentries=0
                                        etime=3 csn=5447282c000300050000
                                    
                                    
                                    The application retried the
                                      delete several times for a couple
                                      of hours (while the server wasn’t
                                      getting any other requests) and
                                      the result was always the same
                                      (err=51).  Each time that
                                      happened, the error log had the
                                      following:
                                    

                                      [21/Oct/2014:23:44:44
                                        -0400] - Retry count exceeded in
                                        delete
                                    
                                    
                                    My first question is, what
                                      would cause a problem like this?
                                    

                                    I simply restarted that
                                      directory and then the update
                                      succeeded.  However, when the
                                      update went to the other 5
                                      servers, they failed in the same
                                      way and the same error was logged
                                      in their log files.  But the
                                      update wasn’t retried.  It was
                                      just skipped and future updates
                                      via replication succeeded on those
                                      5 servers.
                                    

                                    My second question is, what’s
                                      the best way to monitor for these
                                      types of replication errors?  In
                                      this
                                      case, nsds5replicaLastUpdateStatus
                                      did not indicate a problem.  If I
                                      had not been looking at the error
                                      file on those 5 hosts, I’m
                                      wondering how I would have known
                                      that a delete failed to replicate
                                      to them.  If the answer is to just
                                      have something monitoring the
                                      error log files, are there
                                      specific search strings to look
                                      for to separate out updates that
                                      have failed and won’t be retried
                                      from other errors (e.g. temporary
                                      connection issues)?  Just curious
                                      if there is a best practice here.
                                    

                                    Thanks!
                                    

                                    — Shilen
                                    

                                    --
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxxhttps://admin.fedoraproject.org/mailman/listinfo/389-users
                                  
                                  
                          --
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxxhttps://admin.fedoraproject.org/mailman/listinfo/389-users
                        
                        
                --
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxxhttps://admin.fedoraproject.org/mailman/listinfo/389-users
              
              
      --
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users
    
    
--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users