Re: [PATCH] prevent slapd from hanging under unlikely circumstances

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 03, 2020 at 10:38:59AM +1000, William Brown wrote:
> 
> 
> > On 1 Feb 2020, at 12:10, Jay Fenlason <ds389@xxxxxxxxxxxxxxx> wrote:
> > 
> > I have a small FreeIPA deployment of ~6-8 servers running on Centos
> > 7.7.  Do to the addition and removal of some of the servers, some
> > cruft (tombstones, replication conflicts, etc) have crept in to the
> > directory.  I noticed that when I attempted to delete some of the
> > cruft entries, ns-slapd would hang, failirg to process requests, or
> > even shut down.

> Can you tell us exactly what entries you noticed and how you attempted to delete them? There are certainly some things like tombstones and such that you shouldn't be touching as they are part of the internal replication state machine.

No, I don't remember what entries they were.  I was following
instructions from:
https://docs.fedoraproject.org/en-US/Fedora/18/html/FreeIPA_Guide/ipa-replica-manage.html
(or maybe elsewhere) using ldapdelete to remove tombstones for a truly
deleted server.

> Knowing what you did will also help us to create a test case and
> reproducers to validate your patch also.

I found the bug by doing a series of "ipa-client-install" (with lots
of arguments, followed by
echo ca_host = {a not-firewalled IPA CA} >> /etc/ipa/default.conf
echo [global] > /etc/ipa/installer.conf
echo ca_host = {ditto} >> /etc/ipa/installer.conf
echo {password} | kinit admin
ipa hostgroup-add-member ipaservers --hosts $(hostname -f)
ipa-relica-install --setup-ca --setup-dns --forwarder={ip addr}

followed by the replica install failing due to network issues,
misconfigured firewalls, etc, then
ipa-server-install --uninstall on the host
and ipa-replica-manage del {failed install host}
elsewhere in the mesh, sometimes with ldapdelete of the initial
replication agreement that ipa-replica-manage did not remove.

Rinse, repeat. . .

Until ipa-replica-install starts failing because the source LDAP
server hangs (because of this bug) during the "starting initial
replication" step.  It was while debugging that that I discovered that
ldapdelete on the tombstone entries also caused the LDAP servers to
lock up.


> Thanks for the report :) 

Incidentally, there's another bug, which I have not investigated,
where attempting to ldapdelete a problematic tombstone entry
immediately after restarting the LDAP server returns an error, and
nothing is deleted on the server.  If you do an ldapsearch, and then
an ldapdelete, the entry is removed, but then slapd hangs (this bug
again) and does not respond to searches or deletes (or shutdown
requests) until you kill -9 it.  I don't know how it relates to this
bug.

    -- JF
_______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-devel@xxxxxxxxxxxxxxxxxxxxxxx




[Index of Archives]     [Fedora Directory Announce]     [Fedora Users]     [Older Fedora Users Mail]     [Fedora Advisory Board]     [Fedora Security]     [Fedora Devel Java]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Mentors]     [Fedora Package Review]     [Fedora Art]     [Fedora Music]     [Fedora Packaging]     [CentOS]     [Fedora SELinux]     [Big List of Linux Books]     [KDE Users]     [Fedora Art]     [Fedora Docs]

  Powered by Linux