[Yum] Strange hang...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 1 Nov 2006, Brian Long wrote:

> On Wed, 2006-11-01 at 08:58 -0500, Robert G. Brown wrote:
>> On Wed, 1 Nov 2006, seth vidal wrote:
>>
>>> rm -f /var/lib/rpm/__db*
>>> rpm --rebuilddb
>>
>> Yes this was exactly it.  yum -d 10 showed the hang was at the rpmdb
>> read, so I could figure it out.
>
> Robert, Seth,
>
> We've had so many issues with the futex issue that our auto-update
> script has a ton of logic in it before calling yum update.
>
> We run "/usr/lib/rpm/rpmdb_stat -c -h /var/lib/rpm | grep 'current
> locks' | cut -f 1" and if the number is not 0, we remove
> the /var/lib/rpm/__db* files and rebuild the RPM database.

In my case it was probably a self-inflicted wound.  One thing about yum
that drives me periodically bananas is that it doesn't interrupt with
Ctrl-C (IMO) sanely.  So if I (for example) start a massive yum update
on my laptop and then have to go somewhere and need to shut it down, or
realize that I mistyped something and will have to wait three or four
minutes for yum to do its usual find a repo, download any changed xml
stuff, and then actually execute the command before I get to re-enter
the command CORRECTLY, I have to suspend and kill -9 or wait anyway.
kill -9 of course in turn means that things don't always get cleaned up
-- last night obviously I interrupted under the latter circumstance and
left behind lockfiles, and it must have been on the very last yum
command I ran because it worked fine up to that point.

What I'd really like is:

   a) Automated handling of locks -- a check for existing locks and bomb
with a message, and/or a flag that says "nuke any locks and rebuild the
rpmdb as needed".  As I said, GUI-level (non-sysadmin) users are NOT
going to be able to remove locks and rebuild the db, ever, so for yum to
be usable via a GUI by general users it basically can never lock up.
Really it should "never" lock up even when run from a command line on
the basis of this sort of state (so this is a bug, not just an annoying
feature) especially "silently" so you have to wait a really long time to
see that it is hung and not just thinking or waiting on a network
resource.

   b) Ctrl-C should cause yum to gracefully exit, no matter what it is
doing, at the earliest possible time, and with immediate messages saying
"yes, I got your command to quit, be patient while I clean up" as it
works itself free of locks, interrupts downloads, etc.  It should NOT
cause fallthrough on the repository being used for a download as it now
does.  Every similar (non-interactive) application in the known universe
quits on Ctrl-C.

   c) Some OTHER key sequence can cycle through the repos -- I'd suggest
Ctrl-N(ext)/Ctrl-P(revious) for Next/Previous repository, or some other
hook that can easily be connected to yumex. That way slow repos, hung
repos, slow networks can be skipped in real time without forcing a yum
quit, but one can force a yum quit more easily than eight million
Ctrl-C's or Ctrl-Z followed by kill -9.

> It's surprising the number of rpm --rebuilddb's we run on a daily basis
> across 7,000+ RHEL 3 and 4 hosts.  A bunch of fixes were put into place
> on RHEL 3 Update 5's rpm, so we've pushed it to all RHEL 3 boxes
> (including those not yet running Update 5+).  It reduced the issue, but
> we still have a few hundred rebuilds a day and we've haven't been able
> to track it down.

Well, maybe the locks were NOT because of interrupting yum.  Maybe there
are bugs somewhere else.  Either way, yum can be made robust against the
locks, I think.  The thing one would have to worry about is cron job A
running yum, or rpm, at some time while a user or sysadmin B is
simultaneously trying to run yum, or rpm, from a command line or other
interface or script.  yum has its own lock, which is good (it seems to
work whenever I've accidentally challenged it), but it sounds like it
can maybe still get into a race condition with rpm, which presumably has
its own independent locks.  Usually there is a pattern of conflict
resolution while requesting both locksets that can prevent this sort of
race, if it is what is leaving the locks behind.

   rgb

>
> /Brian/
>

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@xxxxxxxxxxxx



[Index of Archives]     [Fedora Users]     [Fedora Legacy List]     [Fedora Maintainers]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]

  Powered by Linux