Re: Lock table is out of available lock entries

Mark Reynolds <mreynolds@xxxxxxxxxx> · Wed, 11 Aug 2021 09:01:02 -0400

On 8/11/21 8:52 AM, Kees Bakker wrote:
On 11-08-2021 14:38, Mark Reynolds wrote:
On 8/11/21 8:30 AM, Kees Bakker wrote:
On 11-08-2021 11:49, Kees Bakker wrote:
On 09-08-2021 20:23, Mark Reynolds wrote:
On 8/9/21 1:47 PM, Kees Bakker wrote:
On 09-08-2021 19:25, Mark Reynolds wrote:
On 8/9/21 1:16 PM, Kees Bakker wrote:
On 09-08-2021 18:43, Mark Reynolds wrote:
On 8/9/21 11:20 AM, Kees Bakker wrote:
On 09-08-2021 16:00, Mark Reynolds wrote:
On 8/9/21 8:09 AM, Kees Bakker wrote:
Hi,

When my dirsrv was trying to compact the databases I was 
getting
this
error

[07/Aug/2021:23:59:02.715984489 +0200] - NOTICE - 
bdb_compact -
Compacting databases ...
[07/Aug/2021:23:59:02.765932397 +0200] - NOTICE - 
bdb_compact -
Compacting DB start: userRoot
[07/Aug/2021:23:59:03.518175414 +0200] - NOTICE - 
bdb_compact -
compactdb: compact userRoot - 417 pages freed
[07/Aug/2021:23:59:03.576427786 +0200] - NOTICE - 
bdb_compact -
Compacting DB start: ipaca
[07/Aug/2021:23:59:03.659941533 +0200] - NOTICE - 
bdb_compact -
compactdb: compact ipaca - 419 pages freed
[07/Aug/2021:23:59:03.718445310 +0200] - NOTICE - 
bdb_compact -
Compacting DB start: changelog
[08/Aug/2021:00:00:40.807571334 +0200] - NOTICE -
NSMMReplicationPlugin - changelog program - cl5CompactDBs -
compacting
replication changelogs...
[08/Aug/2021:00:00:54.309357211 +0200] - ERR - libdb -
BDB2055 Lock
table is out of available lock entries
[08/Aug/2021:00:00:54.726504736 +0200] - ERR - bdb_compact -
compactdb: failed to compact changelog; db error - 12 Cannot
allocate
memory
[08/Aug/2021:00:00:54.801571421 +0200] - ERR - libdb -
BDB2055 Lock
table is out of available lock entries
[08/Aug/2021:00:00:54.876618702 +0200] - ERR -
NSMMReplicationPlugin -
changelog program - cl5CompactDBs - Failed to compact
a797bb0b-be1d11eb-88c0b677-613aa2ad; db error - 12 Cannot
allocate
memory
[08/Aug/2021:00:00:57.253006449 +0200] - NOTICE - 
bdb_compact -
Compacting databases finished.

There are about 402k entries in cn=changelog.

I have a few questions
1) is it normal to have so many entries in cn=changelog? On
another
replica I have almost 3M entries Isn't this cleaned up?
2) the number of locks is 50000 (there are two config items).
Should I
increase that number? If so, increase to what?
3) is there maybe something else going on, causing the
exhaustion of
the locks?

Ok, so by default there is no changelog trimming enabled. So 
the
changelog will grow without bounds, which is bad.

How much of this [1] applies? Indeed it says "By default the
Replication Changelog does not use any trimming by default."
So, how is the trimming actually enabled? I have these entries

dn: cn=changelog5,cn=config
cn: changelog5
nsslapd-changelogdir: /var/lib/dirsrv/slapd-EXAMPLE-COM/cldb
nsslapd-changelogmaxage: 30d
objectClass: top
objectClass: extensibleobject

So trimming is set, but it's set to 30 days (30d), that
could/should be
shortened to "7d".

Somehow I doubt that this will do anything.

Ahhh, I thought this was the Replication Changelog, but it is the
"Retro" Changelog.  Ok so check this out:

https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/administration_guide/managing_replication-using_the_retro_changelog_plug_in#Using_the_Retro_Changelog_Plug_in-Trimming_the_Retro_Changelog 

Or update config directly using ldapmodify:

# ldapmodify -D ...
dn: cn=Retro Changelog Plugin,cn=plugins,cn=config
changetype: modify
replace: nsslapd-changelogmaxage: 7d
nsslapd-changelogmaxage: 7d

(( Typo? The replace line is without the value ))
yes, sorry...

Hmm, that maxage was already at 2 days

Ok, perhaps you are running into a known bug where indexes created by
plugins are not enabled.  Or might be a config issue?

Try this first.  Set in the retro changelog entry:

nsslapd-changelog-trim-interval: 300   <--- 5 minutes
(Sorry to be picky, but it is nsslapd-changelogtrim-interval. It's
default is 300.)

Restart the server to make sure it was applied.  You can enable
"plugin"
logging to see what it does.

nsslapd-errorlog-level: 65536

wait over 5 minutes, then grep the errors log for "DSRetroclPlugin".
This will tell us what the plugin is doing when it goes to trim the
database.

IMPORTANT, disable plugin logging immediately after you get your 
data.
Set the errorlog level to 0 to go back to default logging.

If the logging shows it's not finding any candidates then it might be
the indexing issue I mentioned above.  In that case you just need to
reindex the retro changelog:

Before changing anything I enabled Plug-in debugging which revealed
something interesting

[root@linge ~]# grep trim
/var/log/dirsrv/slapd-GHS-NL/errors-plugin-debug-20210811-1108
[11/Aug/2021:11:01:49.792450548 +0200] - DEBUG - DSRetroclPlugin -
cltrim: ldrc=0, first_time=1622032141, cur_time=3702600
[11/Aug/2021:11:01:49.842339467 +0200] - DEBUG - DSRetroclPlugin -
retrocl_housekeeping - changelog does not need to be trimmed
[11/Aug/2021:11:06:49.953401407 +0200] - DEBUG - DSRetroclPlugin -
cltrim: ldrc=0, first_time=1622032141, cur_time=3702900
[11/Aug/2021:11:06:50.012249703 +0200] - DEBUG - DSRetroclPlugin -
retrocl_housekeeping - changelog does not need to be trimmed

This shows that the interval is 300 (even if
nsslapd-changelogtrim-interval is not explicitly set in
cn=changelog5,cn=config

Look at cur_time. That is the "uptime" of the system, not UTC.

[root@linge ~]# uptime
 11:21:42 up 42 days, 20:49,  2 users,  load average: 0.15, 0.14, 0.16

Next, assuming this code from retrocl_trim.c is making the decision
to trim

            first_time = retrocl_getchangetime(SLAPI_SEQ_FIRST, 
&ldrc);
            slapi_log_err(SLAPI_LOG_PLUGIN, RETROCL_PLUGIN_NAME,
                          "cltrim: ldrc=%d, first_time=%ld,
cur_time=%ld\n",
                          ldrc, first_time, cur_time);
            if (LDAP_SUCCESS == ldrc && first_time > (time_t)0L &&
                first_time + ts.ts_c_max_age < cur_time) {
                must_trim = 1;
            }

There is a mixup of absolute (first_time) and relative (cur_time)
time. Which of the two is wrong?

This is probably introduced in commit a5029c8091 for Pull Request [1].
The queue function is now called with a relative time.

[1] https://github.com/389ds/389-ds-base/pull/4450
[2] https://github.com/389ds/389-ds-base/issues/4384

Yes I think you found a bug.  Are you building the server yourself?  Can
I send you a code patch to try?

No, I'm "just" a CentOS user, with access to the Github source code.
And the last time I build an rpm was more than 20 years ago in the 
early RedHat days.

Also, I don't want to mess too much with our production system.

Understood.  What version of 389-ds-base are you using?

Thanks,
Mark

Mark

# dsctl slapd-YOUR_INSTANCE stop

# dsctl slapd-YOUR_INSTANCE db2index changelog

# dsctl slapd-YOUR_INSTANCE start

Then wait 5 minutes and see if anything was trimmed.

Mark

dn: cn=Retro Changelog Plugin,cn=plugins,cn=config
cn: Retro Changelog Plugin
nsslapd-attribute: nsuniqueid:targetUniqueId
nsslapd-changelogmaxage: 2d
nsslapd-include-suffix: cn=dns,dc=example,dc=com
nsslapd-plugin-depends-on-named: Class of Service
nsslapd-plugin-depends-on-type: database
nsslapd-pluginDescription: Retrocl Plugin
nsslapd-pluginEnabled: on
nsslapd-pluginId: retrocl
nsslapd-pluginInitfunc: retrocl_plugin_init
nsslapd-pluginPath: libretrocl-plugin
nsslapd-pluginType: object
nsslapd-pluginVendor: 389 Project
nsslapd-pluginVersion: 1.4.3.231
nsslapd-pluginbetxn: on
nsslapd-pluginprecedence: 25
objectClass: top
objectClass: nsSlapdPlugin
objectClass: extensibleObject

HTH,

Mark

The first entry in cn=changelog is more than two months old. (May
26th
is about
the time I installed this server as a FreeIPA replica.)

dn: changenumber=1,cn=changelog
objectClass: top
objectClass: changelogentry
objectClass: extensibleObject
targetuniqueid: 1e89ebcb-e20111e8-8820f96e-fc0c60a4
changeNumber: 1
targetDn:
idnsname=_ldap._tcp,idnsname=example.com.,cn=dns,dc=example,dc=com
changeTime: 20210526122901Z
changeType: modify
changes:: YWRkOiBzUlZSZWNvc...gplbnRyeXVzbjogMTEyCi0KAA==

Is there a way to see trimming in action? I'm afraid it is not 
doing
anything.

What diagnostic can I enable to see the trimming in action?

I recommend setting up the changelog max age to 7 days:

https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/html/administration_guide/trimming_the_replication_changelog 

Once that the trimming cleans up the changelog the database
compaction
should work without issue.  You can also increase the database
locks to
1 million (that does require a restart of the server to take
effect).

Let's hope that 1 million is enough :-)

It might not be, you can keep bumping it up if needed. There 
is no
limit, or negative impact on db performance.

Mark

HTH,

Mark

[1]
https://directory.fedoraproject.org/docs/389ds/FAQ/changelog-trimming.html 

--
Directory Server Development Team

_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to
389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx 

Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure

--
Directory Server Development Team

--
Directory Server Development Team

_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 
389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: 
https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx 

Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx 

Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure

--
Directory Server Development Team

--
Directory Server Development Team
_______________________________________________
389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-users@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure