Re: Crash with SEGV after compacting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 8/3/22 1:11 PM, Niklas Schmatloch wrote:
Hi

My organisation is using a replicated 389-dirsrv. Lately, it has been crashing
each time after compacting.

It is replicable on our instances by lowering the compactdb-interval to
trigger the compacting:

     dsconf -D "cn=Directory Manager" ldap://127.0.0.1 -w 'PASSWORD_HERE' backend config set --compactdb-interval 300

Tip - you can use the server instance name in place of the credentials and URL.  It will use LDAPI as long as run it as root:

    dsconf slapd-INSTANCE backend config set --compactdb-interval 300

or even shorter (without the "slapd-" if the instance name does not match an argument in dsconf):

   dsconf INSTANCE backend config set --compactdb-interval 300

Makes it easier to use the new tools IMHO.



This is the log:

     [03/Aug/2022:16:06:38.552781605 +0200] - NOTICE - checkpoint_threadmain - Compacting DB start: userRoot
     [03/Aug/2022:16:06:38.752592692 +0200] - NOTICE - bdb_db_compact_one_db - compactdb: compact userRoot - 8 pages freed
     [03/Aug/2022:16:06:44.172233009 +0200] - NOTICE - bdb_db_compact_one_db - compactdb: compact userRoot - 888 pages freed
     [03/Aug/2022:16:06:44.179315345 +0200] - NOTICE - checkpoint_threadmain - Compacting DB start: changelog
     [03/Aug/2022:16:13:18.020881527 +0200] - NOTICE - bdb_db_compact_one_db - compactdb: compact changelog - 458 pages freed
     dirsrv@auth-alpha.service: Main process exited, code=killed, status=11/SEGV
     dirsrv@auth-alpha.service: Failed with result 'signal'.
     dirsrv@auth-alpha.service: Consumed 2d 6h 22min 1.122s CPU time.

The first steps are done very quickly, but the step before the 458 pages of the
retro-changelog are freed, takes several minutes. In this time the dirsrv writes
more than 10 G and reads more than 7 G (according to iotop).

After this line is printed the dirsrv crashes within seconds.
What I also noticed is, that even though it said it freed a lot of pages the
retro-changelog does not seem to change in size.
The file `/var/lib/dirsrv/slapd-auth-alpha/db/changelog/id2entry.db` is 7.2 G
before and after the compacting.


Debian 11.4
389-ds-base/stable,now 1.4.4.11-2 amd64

Does someone have an idea how to debug / fix this?

Definitely need a good stacktrace from the crash.  Unfortunately I think this doc is slightly outdated but it's mostly accurate (the core file location is probably wrong): https://www.port389.org/docs/389ds/FAQ/faq.html#sts=Debugging%C2%A0Crashes

You could also live debug it as well by just attaching gdb to the ns-slapd process (after installing the devel and debuginfo packages) and waiting for the compaction to occur.  Then when it crashes get the stack of the crashing thread.  Or, all threads: (gdb) thread apply all bt full

Question, is there trimming set up on the retrocl?  How aggressive are the trimming settings?  Not sure if trimming more entries before the next compaction would help or hurt.

Anyway the server should never crash, so please provide the requested information and we will take a look at it.

Thanks,

Mark


Thanks
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue

--
Directory Server Development Team
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue




[Index of Archives]     [Fedora User Discussion]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [Fedora News]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Maintainers]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [Fedora Fonts]     [ATA RAID]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Centos]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora QA]     [Fedora Triage]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Tux]     [Yosemite News]     [Yosemite Photos]     [Linux Apps]     [Maemo Users]     [Gnome Users]     [KDE Users]     [Fedora Tools]     [Fedora Art]     [Fedora Docs]     [Maemo Users]     [Asterisk PBX]     [Fedora Sparc]     [Fedora Universal Network Connector]     [Fedora ARM]

  Powered by Linux