On Wed, Mar 12, 2014 at 03:08:24PM -0400, Waiman Long wrote: > On 03/12/2014 02:54 PM, Waiman Long wrote: > >+ /* > >+ * Set the lock bit& clear the waiting bit simultaneously > >+ * It is assumed that there is no lock stealing with this > >+ * quick path active. > >+ * > >+ * A direct memory store of _QSPINLOCK_LOCKED into the > >+ * lock_wait field causes problem with the lockref code, e.g. > >+ * ACCESS_ONCE(qlock->lock_wait) = _QSPINLOCK_LOCKED; > >+ * > >+ * It is not currently clear why this happens. A workaround > >+ * is to use atomic instruction to store the new value. > >+ */ > >+ { > >+ u16 lw = xchg(&qlock->lock_wait, _QSPINLOCK_LOCKED); > >+ BUG_ON(lw != _QSPINLOCK_WAITING); > >+ } > It was found that when I used a direct memory store instead of an atomic op, > the following kernel crash might happen at filesystem dismount time: > > [ 1529.936714] Call Trace: > [ 1529.936714] [<ffffffff811c2d03>] d_walk+0xc3/0x260 > [ 1529.936714] [<ffffffff811c1770>] ? check_and_collect+0x30/0x30 > [ 1529.936714] [<ffffffff811c3985>] shrink_dcache_for_umount+0x75/0x120 > [ 1529.936714] [<ffffffff811adf21>] generic_shutdown_super+0x21/0xf0 > [ 1529.936714] [<ffffffff811ae207>] kill_block_super+0x27/0x70 > [ 1529.936714] [<ffffffff811ae4ed>] deactivate_locked_super+0x3d/0x60 > [ 1529.936714] [<ffffffff811aea96>] deactivate_super+0x46/0x60 > [ 1529.936714] [<ffffffff811ca277>] mntput_no_expire+0xa7/0x140 > [ 1529.936714] [<ffffffff811cb6ce>] SyS_umount+0x8e/0x100 > [ 1529.936714] [<ffffffff815d2c29>] system_call_fastpath+0x16/0x1b > It was more readily reproducible in a KVM guest. It was harder to reproduce > in a bare metal machine, but kernel crash still happened after several > tries. > > I am not sure what exactly cause this crash, but it will have something to > do with the interaction between the lockref and the qspinlock code. I would > like more eyes on that to find the root cause of it. I cannot reproduce with my series that has the one word write. What I did was I made my swap partition (who needs that anyway on a machine with 16G of memory) into an XFS partition. Then I copied my linux.git onto it and unmounted. I'll try a few more times; the above trace seems to suggest it happens during dcache cleanup, so I suppose I should read the filesystem some and unmount again. Is there anything specific you did to make it go bang? _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization