Re: memory consumption

Rich Megginson <rmeggins@xxxxxxxxxx> · Mon, 16 Apr 2012 15:51:06 -0600



    On 04/16/2012 03:22 PM, Russell Beall wrote:
    

        On Apr 16, 2012, at 1:50 PM, Rich Megginson wrote:
        

           I would still like to
            know which parameters you set and the values you used.

          
        When I first tried this, the change log was set to
          unlimited, (the default), and the purge delay was set to 7
          days I think.  I reduced the purge delay to about 15 minutes.
          and about 10 minutes for the change log entry age.  These were
          changed using the console and then the server was restarted.
           I'm not sure of this exactly because those settings were
          deleted when I deleted the replication agreement and stopped
          the changelogging.
      
    
    ok - should be trimming something - not sure what's going on here -
    could be a bug

    
                      The consumer has processed all updates and
                        also seems to exhibit overconsumption of memory.
                    
                    
                    Do you see any issue if you don't use replication at
                    all?   That is, is this issue related to
                    replication?

                  
                Yes, I haven't seen it jump up all the way yet, but
                  it is up to 13GB after only a few loops.
                

                This was without replication enabled.
                

                When I re-enabled replication just now to see if
                  the excess entries would be replicated to the
                  consumer, I see that thousands of entries are being
                  sent over during consumer initialization.  It just
                  finished replicating and sent 7639 entries.  There are
                  only about 800 valid entries.  Searching from the base
                  suffix for all available current DN values on the
                  master results in only "# numEntries: 855"
              
            
            I wonder if they are tombstone entries?  Are you doing
            ldapdelete operations at all?

          
        Could be.  Yes, the entries are deleted using ldapdelete
          over a file of DN values.  I was incorrect when I mentioned
          that I was using ldapmodify in this case.
      
    
    Ok, then they are definitely tombstone entries.

    
    The purge setting controls how long tombstone entries are kept
    before they are cleaned up.  There is a thread inside the directory
    server that runs every hour (by default) that cleans up tombstones
    that are older than the purge delay.

    
http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/9.0/html-single/Administration_Guide/index.html#Multi_Master_Replication-Configuring_the_Read_Write_Replicas_on_the_Supplier_Servers

    
http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/9.0/html/Configuration_Command_and_File_Reference/Core_Server_Configuration_Reference.html#Replication_Attributes_under_cnreplica_cnsuffixName_cnmapping_tree_cnconfig-nsDS5ReplicaPurgeDelay

    
http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/9.0/html/Configuration_Command_and_File_Reference/Core_Server_Configuration_Reference.html#Replication_Attributes_under_cnreplica_cnsuffixName_cnmapping_tree_cnconfig-nsDS5ReplicaTombstonePurgeInterval

    
    Note that these parameters have nothing to do with the changelog -
    the changelog has its own separate trimming attributes:

    
http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/9.0/html/Configuration_Command_and_File_Reference/Core_Server_Configuration_Reference.html#cnchangelog5-nsslapd_changelogmaxage_Max_Changelog_Age

    
http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/9.0/html/Configuration_Command_and_File_Reference/Core_Server_Configuration_Reference.html#cnchangelog5-nsslapd_changelogmaxentries_Max_Changelog_Records

    
        I'm not very familiar with the concept of "tombstone" but I
          used dbscan to look over the id2entry file, and it contains
          all 7639 entries that were being sent over.  If the tombstone
          issue is causing the server to hold onto duplicate entries,
          then maybe I just have to run deletions in a different way...
        

                After restarting the master and doing another
                  consumer initialization, there seems to remain a
                  record of these excess entries and the consumer is
                  being initialized with them even though the change log
                  is currently empty at 16K.
              
            
            Consumer initialization does not use the changelog - it
            reads the entries directly from the database and sends them
            to the consumer.

          
                Perhaps there is a problem with the way I am
                  deleting and adding entries and they are being
                  duplicated behind the scenes somehow???  I'm just
                  using a db2ldif export of those entries and running
                  ldapadd over that file...
                

                      Are there any pointers related to this?
                    
                    
                    Are you seeing https://fedorahosted.org/389/ticket/51 ?

                    When you start to see memory growth, are you using
                    all of your cache?

                  
                I am not using MMR (pretty sure unless that came
                  enabled by default), nor am I using GSSAPI.
                

                The memory usage pushes well beyond the cache size.
              
            
            The thing about ticket 51 (not sure if it is in the ticket,
            perhaps it is in the linked bugzilla bug) is that the memory
            growth is only seen _if the entry cache is maxed out_.  That
            is, if you are able to keep the entry cache max size well
            above the actual amount of data used, you do not see the
            memory growth.  We have also run valgrind but have not seen
            any "real" memory leaks.  Note that entries stored in the
            entry cache will be reported as "leaks" because we do not
            free the entry cache at shutdown.  Our best guess for the
            memory growth issues in ticket 51 is that either there is a
            very subtle memory leak related to entry "churn" as entries
            are evicted and stored in the entry cache, or the memory
            fragmentation that results from that churn.

          
        Initially, the entry cache was set to 12G, far in excess of
          the database in its reduced form.
      
    
    But do you eventually see the cache usage grow to at or near the max
    cache size?

    
                      If there is no information about this, is
                        there a documentation page that might instruct
                        me in the correct way to attach valgrind to the
                        ns-slapd process so I can see if there is some
                        kind of huge leak?
                    
                    
                    AFAIK you can't attach valgrind to a running
                    process.

                    try this:

                    
                    service dirsrv stop

                    ( . /etc/sysconfig/dirsrv ; .
                    /etc/sysconfig/dirsrv-INSTANCENAME ; valgrind -q 
                    --tool=memcheck --leak-check=yes
                    --leak-resolution=high --num-callers=50
                    --log-file=/var/log/dirsrv/slapd-INSTANCENAME/valgrind.log
                    /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-INSTANCENAME
                    -i /var/run/dirsrv/slapd-INSTANCENAME.pid -w
                    /var/run/dirsrv/slapd-INSTANCENAME.startpid -d 0 )
                    &

                    
                    valgrind will log to
                    /var/log/dirsrv/slapd-INSTANCENAME/valgrind.log

                    
                    Note that running your server with valgrind will
                    really cripple the performance - may be unusable in
                    a production environment - you may also run afoul of
                    selinux

                    valgrind will not report memory leaks until you
                    shutdown the server (just kill -15 <pid of
                    ns-slapd or valgrind>)
              
              
              I'll give this a shot and see what happens.
              

              Looks like we already have some kind of handle to the
                situation since the excess entries are already being
                reported by the entry caches.
            
            ? Not sure what you mean here.

          
        This means that the server is creating and holding onto
          excess entries, and the server is reporting this fact.  This
          would be a different type of leak than lost memory where
          valgrind would be needed to see.
      
    
    The excess entries are probably tombstone entries - you should see
    roughly a tombstone entry being added every time you delete an
    entry.

    
        Regards,
        Russ.
        

              Thanks so much for your advices!
              Russ.
              

      --
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users
    
    
--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users