Re: Strange yum update hang (or something else) in Rawhide .. how to debug this further?

Panu Matilainen <pmatilai@xxxxxxxxxxxxxxx> · Tue, 05 Feb 2013 09:24:41 +0200

On 02/05/2013 12:32 AM, Richard W.M. Jones wrote:
On Mon, Feb 04, 2013 at 07:17:35PM +0200, Panu Matilainen wrote:
On 02/04/2013 07:01 PM, Richard W.M. Jones wrote:
On Mon, Feb 04, 2013 at 04:38:08PM +0000, Richard W.M. Jones wrote:

   Cleanup    : cpp-4.8.0-0.7.fc19.x86_64                                215/262
   Cleanup    : gdb-7.5.50.20130118-2.fc19.x86_64                        216/262
   Cleanup    : 1:findutils-4.5.10-7.fc19.x86_64                         217/262
   Cleanup    : spice-server-0.12.2-2.fc19.x86_64                        218/262
   Cleanup    : cracklib-2.8.22-2.fc19.x86_64                            219/262
   Cleanup    : libvirt-daemon-driver-interface-1.0.1-6.fc19.x86_64      220/262
   Cleanup    : libvirt-daemon-driver-nodedev-1.0.1-6.fc19.x86_64        221/262
   Cleanup    : libvirt-daemon-driver-nwfilter-1.0.1-6.fc19.x86_64       222/262
   Cleanup    : libvirt-daemon-driver-secret-1.0.1-6.fc19.x86_64         223/262
   Cleanup    : libvirt-daemon-1.0.1-6.fc19.x86_64                       224/262
   Cleanup    : libvirt-client-1.0.1-6.fc19.x86_64                       225/262
   Cleanup    : cyrus-sasl-2.1.25-2.fc19.x86_64                          226/262
   Cleanup    : openldap-2.4.33-3.fc19.x86_64                            227/262
   Cleanup    : nss-tools-3.14.1-3.fc19.x86_64                           228/262
   Cleanup    : nss-sysinit-3.14.1-3.fc19.x86_64                         229/262
   Cleanup    : nss-3.14.1-3.fc19.x86_64                                 230/262
(and here it hangs, for at least 20 minutes)

So how odd is this?  Suddenly it leaps back into life, after maybe
30-40 minutes.

Sounds like https://bugzilla.redhat.com/show_bug.cgi?id=860500

Yes, this looks similar.

It's possible that I ran a non-root yum command in another terminal.

A non-root yum/rpm/similar command wouldn't do. Only processes running 
as root can participate in the shared environment (those __db.* files) 
locking, others use a "private environment" which pretty much equals to 
no locking at all.

So whatever it is that causes the jam is running as root, and equally 
only a root-process can unjam it. Could even be the same thing that 
caused the jam re-running, it's quite clearly something that runs 
automatically in the background and does so more or less periodically, 
occasionally exiting or crashing without freeing the rpmdb iterator it 
holds. Whether its time-based or triggered by some other "external" 
event I dunno. And when it causes a jam its either still running while 
yum is started, or has started after yum.

Rpm uses Berkeley DB's "Concurrent Data Store" model for its database. 
This is a simple model which supposedly provides a deadlock-free 
operation without caller having to bother with explicit locking, but 
unfortunately this only works when all callers are well-behaved. Not 
entirely unlike multitasking in Windows 3.x... All it takes a single 
buggy application forgetting to release its rpmdb iterators (or crashing 
while holding them) to block a concurrent writer "forever". Stale locks 
from no longer active processes are automatically cleaned but only on 
rpmdb open, so a potentially long-running application like yum can get 
stuck if the bad apple comes along after yum started.

Come to think of it, it should be possible to have rpm check for stale 
locks when opening write-cursors. That would help some of the cases 
(where the bad caller already exited/died) at least, but it'd still be 
"vulnerable" to long-running process hanging on to iterators.

	- Panu -
--
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/devel