Christoph - Thank you for getting back to me. The kernel I am using is not a vanilla kernel.org 2.6.32, but is part of the RHEL/CentOS 6 distribution, which has many bug fixes backported, at least up until 2.6.38 or so. Technically, it's their latest kernel. The bug is very difficult to reproduce even on this kernel. It occurs while mounting a snapshot of a very large (40TB) filesystem that is in a very active, continuous use. Once the filesystem snapshot is in that state, it is reproducible 100% (i.e. on every mount), but it's not clear what pushes it there. Unfortunately, a kernel upgrade on that system is currently not possible. Note the lockup occurs during the trimming of free list in xfs_alloc.c:xfs_alloc_fix_freelist when it's too long (look for "Make the freelist shorter if it's too long" comment inside this function), then for some reason the buffer gets double-locked inside xfs_btree_get_bufs, and the mount hangs forever. I suspect that we are not seeing this more frequently because the free list trimming is not a typical occurrence during recovery. I've looked through the patches to xfs stack in kernel.org git, and found virtually no changes to this particular area or references to something similar. I can probably do more research into it, but would really appreciate some guidance. Would it help to obtain the metadata backup from that system? What could possibly cause a deadlock when the log recovery has really no concurrency? Would it help to debug this by somehow forcing free list trimming during the recovery? Thanks again for your help. Kirill -----Original Message----- From: Christoph Hellwig [mailto:hch@xxxxxxxxxxxxx] Sent: Friday, March 30, 2012 12:07 PM To: Kirill Malkin Cc: xfs@xxxxxxxxxxx; xfs-masters@xxxxxxxxxxx Subject: Re: bug #917 - deadlock on log recovery On Thu, Mar 22, 2012 at 01:34:00PM -0400, Kirill Malkin wrote: > Hi, > > I am wondering if someone had a chance to look at the bug #917. I > filed it a couple of weeks ago, but haven?t seen any action. We are > running into it quite a lot, and the only way out of it is to reboot > the OS and drop the log. Below is another stack trace that is slightly > different from the one I filed, but apparently it is the same bug. > > Please let me know if you need any other input. Can you reproduce this with a recent kernel? 2.6.32 is fairly old and a lot of things have changed in this area. I quickly looked over the trace and nothing obvious springs to mind. _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs