Re: Strange Disk IO issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/16/2012 04:06 PM, Nathan Kinder wrote:
On 05/16/2012 01:09 PM, Brad Schuetz wrote:
On 05/16/2012 11:54 AM, Nathan Kinder wrote:
On 05/16/2012 11:19 AM, Brad Schuetz wrote:
On 05/16/2012 06:16 AM, Paul Robert Marino wrote:
The exact timing of the issue is to strange is there a backup job
running at midnight. Or some other timed job that could be eating the
ram or disk IO. Possibly one that is reliant on ldap queries that
would otherwise be inocuious.


It doesn't happen at midnight, it's 24 hours from when the process was
started, so I can restart dirsrv at 3:17pm on Wednesday and at right
around 3:17pm on Thursday that server will go to 100% disk IO usage.
The default tombstone purge interval is 1 day, which seems to fit what
you are seeing.  The tombstone reap thread will start every 24 hours
to find tombstone entries that can be deleted.  The default retention
period for tombstones is 1 week.  It is possible that you have a large
number of tombstone entries that need to be deleted.  This will occur
independently on all of your server instances.  This is controlled by
the "nsDS5ReplicaTombstonePurgeInterval" and "nsDS5ReplicaPurgeDelay"
attributes in your "cn=replica,cn=<suffixDN>,cn=mapping
tree,cn=config" entry.

I have no "nsDS5ReplicaTombstonePurgeInterval" value set (so it's using
that default), and "nsDS5ReplicaPurgeDelay" is set to 3600
Ok, so this means every 24 hours, the tombstone reap thread will look for tombstones older than 1 hour and remove them.


You can search for "(objectclass=nstombstone)" as Directory Manager to
see how many tombstone entries you have.
I have a LOT of tombstone entries, over 200k on this one server (I'm
guessing since I've been restarting the process for over a week now, not
letting it run the cleanup process).
That's possible if you really do 200k delete operations in 1 week, but that sounds like a lot. It would seem that these tombstones have been building up for a longer time than 1 week.

So, any suggestions on what can I do to fix this?  The process that's
reaping the entries is using too much IO making queries time out, older
versions of the software did not exhibit this behavior.  In fact, I can
reinitalize the entire replica faster than this thing is reaping the
entries, it takes 7 minutes to reinit a replica, but when this issue
first started I let the dirsrv run much longer before restarting it.
Due to the number of matching entries for the tombstone search, it is having to walk your entire database, which is why you see the IO spiking.

Perhaps also try increasing nsslapd-idlistscanlimit so that it can hold the entire candidate list of tombstones to delete -
http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/9.0/html/Administration_Guide/Managing_Indexes.html#About_Indexes-Overview_of_the_Searching_Algorithm
What you could do is to export your database with "db2ldif -r". This will preserve the replication related data in the LDIF. You can then remove the tombstone entries in the LDIF file via a script and reimport it. You would have to do this on each server, or do it on one master and then reinitialize the rest of your servers. One thing to watch out for is that you do not want to remove the RUV entry, which will have the "nstombstone" objectclass. This RUV entry will have a "nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff" value that you can use to distinguish if from the rest of the tombstones.

Should I make it purge more frequently so there are fewer entries to
reap?  Or is this just some weird bug?
I'd leave the purge settings as they are.

--
Brad
--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users



[Index of Archives]     [Fedora User Discussion]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [Fedora News]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Maintainers]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [Fedora Fonts]     [ATA RAID]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Centos]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora QA]     [Fedora Triage]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Tux]     [Yosemite News]     [Yosemite Photos]     [Linux Apps]     [Maemo Users]     [Gnome Users]     [KDE Users]     [Fedora Tools]     [Fedora Art]     [Fedora Docs]     [Maemo Users]     [Asterisk PBX]     [Fedora Sparc]     [Fedora Universal Network Connector]     [Fedora ARM]

  Powered by Linux