On 8/3/22 1:11 PM, Niklas Schmatloch wrote:
Hi
My organisation is using a replicated 389-dirsrv. Lately, it has been crashing
each time after compacting.
It is replicable on our instances by lowering the compactdb-interval to
trigger the compacting:
dsconf -D "cn=Directory Manager" ldap://127.0.0.1 -w 'PASSWORD_HERE' backend config set --compactdb-interval 300
Tip - you can use the server instance name in place of the credentials
and URL. It will use LDAPI as long as run it as root:
dsconf slapd-INSTANCE backend config set --compactdb-interval 300
or even shorter (without the "slapd-" if the instance name does not
match an argument in dsconf):
dsconf INSTANCE backend config set --compactdb-interval 300
Makes it easier to use the new tools IMHO.
This is the log:
[03/Aug/2022:16:06:38.552781605 +0200] - NOTICE - checkpoint_threadmain - Compacting DB start: userRoot
[03/Aug/2022:16:06:38.752592692 +0200] - NOTICE - bdb_db_compact_one_db - compactdb: compact userRoot - 8 pages freed
[03/Aug/2022:16:06:44.172233009 +0200] - NOTICE - bdb_db_compact_one_db - compactdb: compact userRoot - 888 pages freed
[03/Aug/2022:16:06:44.179315345 +0200] - NOTICE - checkpoint_threadmain - Compacting DB start: changelog
[03/Aug/2022:16:13:18.020881527 +0200] - NOTICE - bdb_db_compact_one_db - compactdb: compact changelog - 458 pages freed
dirsrv@auth-alpha.service: Main process exited, code=killed, status=11/SEGV
dirsrv@auth-alpha.service: Failed with result 'signal'.
dirsrv@auth-alpha.service: Consumed 2d 6h 22min 1.122s CPU time.
The first steps are done very quickly, but the step before the 458 pages of the
retro-changelog are freed, takes several minutes. In this time the dirsrv writes
more than 10 G and reads more than 7 G (according to iotop).
After this line is printed the dirsrv crashes within seconds.
What I also noticed is, that even though it said it freed a lot of pages the
retro-changelog does not seem to change in size.
The file `/var/lib/dirsrv/slapd-auth-alpha/db/changelog/id2entry.db` is 7.2 G
before and after the compacting.
Debian 11.4
389-ds-base/stable,now 1.4.4.11-2 amd64
Does someone have an idea how to debug / fix this?
Definitely need a good stacktrace from the crash. Unfortunately I think
this doc is slightly outdated but it's mostly accurate (the core file
location is probably wrong):
https://www.port389.org/docs/389ds/FAQ/faq.html#sts=Debugging%C2%A0Crashes
You could also live debug it as well by just attaching gdb to the
ns-slapd process (after installing the devel and debuginfo packages) and
waiting for the compaction to occur. Then when it crashes get the stack
of the crashing thread. Or, all threads: (gdb) thread apply all bt full
Question, is there trimming set up on the retrocl? How aggressive are
the trimming settings? Not sure if trimming more entries before the
next compaction would help or hurt.
Anyway the server should never crash, so please provide the requested
information and we will take a look at it.
Thanks,
Mark
Thanks
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
--
Directory Server Development Team
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue