Re: 389 directory server crash

Rich Megginson <rmeggins@xxxxxxxxxx> · Fri, 12 Jul 2013 09:55:42 -0600



    On 07/12/2013 08:22 AM, Mitja Mihelič
      wrote:

    
      On 07/09/2013 03:34 PM, Rich
        Megginson wrote:

      
        On 07/09/2013 06:43 AM, Mitja
          Mihelič wrote:

        
          Hi!

          
          We are having problems with some our 389-DS instances. They
          crash after receiving an update from the provider.

        
        After looking at the stack trace, I think this is https://fedorahosted.org/389/ticket/47391

        
          The crash happened twice after about a week of running without
          problems. The crashes happened on two consumer servers but not
          at the same time.

          The servers are running CentOS 6x with the following 389DS
          packages installed:

          389-ds-console-doc-1.2.6-1.el6.noarch

          389-console-1.1.7-1.el6.noarch

          389-adminutil-1.1.15-1.el6.x86_64

          389-dsgw-1.1.10-1.el6.x86_64

          389-ds-base-debuginfo-1.2.11.15-14.el6_4.x86_64

          389-admin-1.1.29-1.el6.x86_64

          389-ds-console-1.2.6-1.el6.noarch

          389-admin-console-doc-1.1.8-1.el6.noarch

          389-ds-1.2.2-1.el6.noarch

          389-ds-base-1.2.11.15-14.el6_4.x86_64

          389-ds-base-libs-1.2.11.15-14.el6_4.x86_64

          389-admin-console-1.1.8-1.el6.noarch

          
          We are in the process of replacing the Centos 5x base
          consumer+provider setup with a CentOS 6x base one. For the
          time being, the CentOS 6 machines are acting as consumers for
          the old server. They run for a while and then the replicated
          instances crash though not at the same time.

          One of the servers did not want to start after the crash,
        

        Can you provide the error messages from the errors log?

      
      I have attached error logs from the provider
      (2013-06-27-provider_error) and the consumer
      (2013-06-27-server_two_error) in question.

       
        so
          I have run db2index on its database. It's been running for
          four days and it has still not finished. 
        

        Try exporting using db2ldif, then importing using ldif2db.

      
      The export process hangs. After an hour strace still shows:

      futex(0x7f5822670ed4, FUTEX_WAIT, 1, NULL

      The error log for this is attached as
      2013-07-10-server_two-ldif_import_hangs.

    
    Are you using db2ldif or db2ldif.pl?  If you are using db2ldif, is
    the server running?  If not, please try first shutting down the
    server and use db2ldif.

    
    If db2ldif still hangs, then please follow the instructions at
    http://port389.org/wiki/FAQ#Debugging_Hangs to get a stack trace of
    the hung process.

    
        All
          I get from db2index now are these outputs:

          [09/Jul/2013:13:29:11 +0200] - reindex db: Processed 65095
          entries (pass 1104) -- average rate 53686277.5/sec, recent
          rate 0.0/sec, hit ratio 0%

        
        How many entries do you have in your database?

      
      The number revolves around 65400. It varies perhaps 2 user del/add
      operations a month and 20 attribute changes per week, if that.

       
          The other instance did start up, but the replication process
          did not work anymore. I disabled the replication to this host
          and set it up again. I chose "Initialize consumer now" and the
          consumer crashed every time.
        

        Can provide a stack trace of the core when the server crashes? 
        This may be different than the stack trace below.

      
      The last provided stack trace was produced at the last server
      crash. I will provide another stack trace when CONSUMER_ONE
      crashes again. Currently it refuses to crash at initialization
      time and keeps running.

       
        I
          have enabled full error logging and could find nothing.

          I have read a few threads (not all, I admit) on this list and
          
          http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes
          and tried to troubleshoot.

          
          The crash produced the attached core dump and I could use your
          help with understanding it. As well as any help with the
          crash. If more info is needed I will gladly provide it.

          
          Regards, Mitja

          
          --
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users
        
        
--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users