Re: memory consumption

Russell Beall <beall@xxxxxxx> · Mon, 16 Apr 2012 14:22:06 -0700

On Apr 16, 2012, at 1:50 PM, Rich Megginson wrote:

    I would still like to know which parameters you set and the values
    you used.

When I first tried this, the change log was set to unlimited, (the default), and the purge delay was set to 7 days I think.  I reduced the purge delay to about 15 minutes. and about 10 minutes for the change log entry age.  These were changed using the console and then the server was restarted.  I'm not sure of this exactly because those settings were deleted when I deleted the replication agreement and stopped the changelogging.

              The consumer has processed all updates and also seems
                to exhibit overconsumption of memory.

            Do you see any issue if you don't use replication at all?  
            That is, is this issue related to replication?

        Yes, I haven't seen it jump up all the way yet, but it is
          up to 13GB after only a few loops.

        This was without replication enabled.

        When I re-enabled replication just now to see if the excess
          entries would be replicated to the consumer, I see that
          thousands of entries are being sent over during consumer
          initialization.  It just finished replicating and sent 7639
          entries.  There are only about 800 valid entries.  Searching
          from the base suffix for all available current DN values on
          the master results in only "# numEntries: 855"

    I wonder if they are tombstone entries?  Are you doing ldapdelete
    operations at all?

Could be.  Yes, the entries are deleted using ldapdelete over a file of DN values.  I was incorrect when I mentioned that I was using ldapmodify in this case.

I'm not very familiar with the concept of "tombstone" but I used dbscan to look over the id2entry file, and it contains all 7639 entries that were being sent over.  If the tombstone issue is causing the server to hold onto duplicate entries, then maybe I just have to run deletions in a different way...

        After restarting the master and doing another consumer
          initialization, there seems to remain a record of these excess
          entries and the consumer is being initialized with them even
          though the change log is currently empty at 16K.

    Consumer initialization does not use the changelog - it reads the
    entries directly from the database and sends them to the consumer.

        Perhaps there is a problem with the way I am deleting and
          adding entries and they are being duplicated behind the scenes
          somehow???  I'm just using a db2ldif export of those entries
          and running ldapadd over that file...

              Are there any pointers related to this?

            Are you seeing https://fedorahosted.org/389/ticket/51 ?

            When you start to see memory growth, are you using all of
            your cache?

        I am not using MMR (pretty sure unless that came enabled by
          default), nor am I using GSSAPI.

        The memory usage pushes well beyond the cache size.

    The thing about ticket 51 (not sure if it is in the ticket, perhaps
    it is in the linked bugzilla bug) is that the memory growth is only
    seen _if the entry cache is maxed out_.  That is, if you are able to
    keep the entry cache max size well above the actual amount of data
    used, you do not see the memory growth.  We have also run valgrind
    but have not seen any "real" memory leaks.  Note that entries stored
    in the entry cache will be reported as "leaks" because we do not
    free the entry cache at shutdown.  Our best guess for the memory
    growth issues in ticket 51 is that either there is a very subtle
    memory leak related to entry "churn" as entries are evicted and
    stored in the entry cache, or the memory fragmentation that results
    from that churn.

Initially, the entry cache was set to 12G, far in excess of the database in its reduced form.

              If there is no information about this, is there a
                documentation page that might instruct me in the correct
                way to attach valgrind to the ns-slapd process so I can
                see if there is some kind of huge leak?

            AFAIK you can't attach valgrind to a running process.

            try this:

            service dirsrv stop

            ( . /etc/sysconfig/dirsrv ; .
            /etc/sysconfig/dirsrv-INSTANCENAME ; valgrind -q 
            --tool=memcheck --leak-check=yes --leak-resolution=high
            --num-callers=50
            --log-file=/var/log/dirsrv/slapd-INSTANCENAME/valgrind.log
            /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-INSTANCENAME -i
            /var/run/dirsrv/slapd-INSTANCENAME.pid -w
            /var/run/dirsrv/slapd-INSTANCENAME.startpid -d 0 ) &

            valgrind will log to
            /var/log/dirsrv/slapd-INSTANCENAME/valgrind.log

            Note that running your server with valgrind will really
            cripple the performance - may be unusable in a production
            environment - you may also run afoul of selinux

            valgrind will not report memory leaks until you shutdown the
            server (just kill -15 <pid of ns-slapd or valgrind>)

      I'll give this a shot and see what happens.

      Looks like we already have some kind of handle to the
        situation since the excess entries are already being reported by
        the entry caches.

    ? Not sure what you mean here.

This means that the server is creating and holding onto excess entries, and the server is reporting this fact.  This would be a different type of leak than lost memory where valgrind would be needed to see.

Regards,
Russ.

      Thanks so much for your advices!
      Russ.

--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users