[389-users] Re: [EXT] Re: container 3.1.1: BDB is failing to recover from hard shutdown

Thierry Bordaz <tbordaz@xxxxxxxxxx> · Mon, 5 Aug 2024 18:13:17 +0200



    On 8/5/24 17:59, Darby, Tim - (tdarby)
      wrote:

    
        I'm looking at everything but I don't think it's Docker at this
        point. It does have the appearance of something killing it off
        with a 60 second timeout though.
      
        
        Tim Darby
        
      
      From: David
          Boreham <david@xxxxxxxxxxxxxxx>

          Sent: Monday, August 5, 2024 08:47

          To: 389-users@xxxxxxxxxxxxxxxxxxxxxxx
          <389-users@xxxxxxxxxxxxxxxxxxxxxxx>

          Subject: [EXT] [389-users] Re: container 3.1.1: BDB is
          failing to recover from hard shutdown
         
      
            External
                  Email

              
        Is docker killing the container before recovery completes?
        

        On Mon, Aug 5, 2024, at 9:16 AM, tdarby@xxxxxxxxxxx wrote:

        
          I have a test instance with two 2.5 container instances
            replicating. I took one down, but not gracefully, and
            restarted it using the 3.1.1 container. It showed that it
            was recovering the RUV and then died 60 seconds after
            container start. I've tried to restart it several times and
            it continually fails after 60 seconds. I've never seen a
            389ds instance fail to recover so this is alarming: Here's
            what I see in the logs:

          
          [02/Aug/2024:23:40:08.164519017 +0000] - INFO - main -
            389-Directory/3.1.1 B2024.213.0201 starting up

          
          [02/Aug/2024:23:40:08.167190043 +0000] - INFO - main -
            Setting the maximum file descriptor limit to: 65535

          
          [02/Aug/2024:23:40:08.324281525 +0000] - INFO -
            PBKDF2_SHA256 - Based on CPU performance, chose 2048 rounds

          
          [02/Aug/2024:23:40:08.328414518 +0000] - INFO -
            ldbm_instance_config_cachememsize_set - force a minimal
            value 512000

          
          [02/Aug/2024:23:40:08.334434881 +0000] - NOTICE -
            bdb_start_autotune - found 126976000k physical memory

          
          [02/Aug/2024:23:40:08.336991774 +0000] - NOTICE -
            bdb_start_autotune - found 101486140k available

          
          [02/Aug/2024:23:40:08.340297223 +0000] - NOTICE -
            bdb_start_autotune - total cache size: 29477568512 B; 

          
          [02/Aug/2024:23:40:08.343560343 +0000] - NOTICE -
            bdb_start - Detected Disorderly Shutdown last time Directory
            Server was running, recovering database.

          
          [02/Aug/2024:23:40:50.311047857 +0000] - INFO -
            slapi_vattrspi_regattr - Because pwdpolicysubentry is a new
            registered virtual attribute , nsslapd-ignore-virtual-attrs
            was set to 'off'

          
          [02/Aug/2024:23:40:50.367322989 +0000] - NOTICE -
            NSMMReplicationPlugin - changelog program -
            _cl5ConstructRUVs - Rebuilding the replication changelog
            RUV, this may take several minutes...

          
          [02/Aug/2024:23:41:06.202467004 +0000] - NOTICE -
            NSMMReplicationPlugin - changelog program -
            _cl5ConstructRUVs - Rebuilding replication changelog RUV
            complete.  Result 0 (Success)

          
    At this point the server already completed successfully the BDB
      recovery and the replication plugin startup (Changelog ruv was
      rebuilt). Unfortunately it has not yet completed the startup of
      all the others plugins and it was not up and running.

      So I agree that the most likely RC is that DS is too slow to
      complete its startup (after a crash) and docker kills it. Was it
      SIGKILL ? it should try SIGTERM first.
    

    thierry

    
          (dies here and the container exits)

          
          -- 

          
          _______________________________________________

          
          389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx

          
          To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx

          
          Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/

          
          List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines

          
          List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx

          
          Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue

          
-- 
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue