Re: Isolating a chronic rpmdb corruption problem.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Let me make it more degenerate; maybe you can help me make sense of whether this is a side issue.

I sat down and made a tool to run db_verify against all my systems via mcollective. Large suite of systems, I'm getting a noticeable number of broken systems even just a few days after I've run a full patch via spacewalk's direct-rpm method.

Cleaning them up, I noticed _more_ systems showing up as broken.

Then noticed that db_verify's man page says it doesn't do proper locking. Nice!

So I turn on auditctl watching for anything touching /var/lib/rpm/*, run db_verify in a while-1 loop against all tables to see what happens, and after about a minute (so, 120 calls or so to db_verify later), it starts smoking, segfaulting while opening __db.002.

This whole story is giving me the proper horrors of someone who's seeing how the sausage is made for the first time.

It does raise two more questions:

1) Should I expect that db_verify does the opposite of what it's supposed to do now and again - that is, should it, all by its lonesome, occasionally ruin the __db files?

2) Is there a proper way that doesn't have this no-lock-ruins-everything risk for me to check on the current health of an RPM database?

On Mon, Feb 10, 2014 at 2:35 PM, Tristan Smith <triss@xxxxxxxxxxxxxxxxxx> wrote:
Hiya, folks.

I'm having a bit of a time in my CentOS 6 environment with what I'm guessing is some kind of knuckleheaded behavior in one or more of my utilities.
We have Spacewalk and Puppet working in general harmony, but I have a chronic issue with a significant percentage (call it... 10%) of my hosts turning up with rpmdb problems on a regular basis.  Not the same hosts every time, either. There's some correlation I'm drawing to relatively idle systems, but it may be BS.

When yum tries to install on a borked systems, I get error 12s; db_verify comes up with 'Cannot allocate memory' for Basenames (or sometimes just Packages).  rpm --rebuilddb almost universally makes them okayish again, but not entirely; I'm enjoying lost dependencies here and there (yum check dependencies crying into its beer a lot, and I've got an xargs nightmare to re-install the missing packages)

Basically, I've got a handle on an ever lengthening list of mitigation methods, but what I can't seem to isolate is whodunit. I have no idea what's reaching into the DB hamfisted and making a mess quite so often.

Does anyone have suggestions as to what in hell I should be doing to narrow down causes?

Rpm-list mailing list

[Index of Archives]     [RPM Ecosystem]     [Linux Kernel]     [Red Hat Install]     [PAM]     [Red Hat Watch]     [Red Hat Development]     [Red Hat]     [Gimp]     [Yosemite News]     [IETF Discussion]

  Powered by Linux