On Thu, Jan 31, 2013 at 12:45:41PM -0500, Brian Foster wrote: > On 01/30/2013 04:59 PM, Dave Chinner wrote: > > On Wed, Jan 30, 2013 at 11:02:43AM -0500, Brian Foster wrote: > >> (added Dave and the list back on CC) > >> > >> On 01/30/2013 09:07 AM, Mark Tinguely wrote: > >>> On 01/30/13 00:05, Dave Chinner wrote: > >>>> On Tue, Jan 29, 2013 at 03:42:35PM -0500, Brian Foster wrote: > ... > >> > >> Thanks guys. This certainly looks nicer than messing with the lock > >> wrapper, but is it susceptible to the same problem? In other words, does > >> this fix the problem or just tighten the window? > > > > That's what I need to think about more - the only difference here is > > that we are checking the flag before the down_trylock() instead of > > after.... > > > >> So if the buf lock covers the pinned state (e.g., buffer gets locked, > >> added to a transaction, the transaction gets committed and pins and > >> unlocks the buffer, IIUC) and the stale state (buf gets locked, added to > >> a new transaction and inval'd before the original transaction was > >> written ?), but we don't hold the buf lock in xfs_buf_item_push(), how > >> can we guarantee the state of either doesn't change between the time we > >> check the flags and the time the lock fails? > > > > ... but the order of them being set and checked may be significant > > and hence checking the stale flag first might be sufficient to avoid > > the pin count race and hence the log force. Hence this might just > > need a pair of memory barriers - one here and one in xfs_buf_stale() > > - to ensure that we always see the XBF_STALE flag without needing to > > lock the buffer first. > > > > I _think_ I follow your train of thought. If we're racing on the pin > check, presumably the lock holder is committing the transaction and we > should either already see the buffer being stale, being pinned or we > should get the lock (assuming the order is: stale, pinned, unlocked). Right, that is the order in which operations occur. Marking a buffer stale happens in the transaction body, the pin and subsequent unlock occur some time later during the transaction commit. > That aside for a moment, here's some specific tracepoint (some of which > I've hacked in) data for when the recursion occurs: .... > ... so as expected, the buffer is marked stale, we attempt the trylock, > the buf is pinned, we run the log force and we're dead. *nod* > From the looks of the trace, I'd expect an additional stale check to > eliminate the ability to reproduce this, but that doesn't necessarily > make it correct of course. Regardless, I'm putting that to the test now > and letting it run for a bit while we get this sorted out. > > I also need to stare at the code some more. My pending questions are: > > - Is it always reasonable to to assume/consider a stale buf as pinned in > the context of xfsaild? A stale buffer in the AIL is either pinned (transaction committed) or on it's way to being pinned (transaction that marked it stale is currently in progress). When the transaction commit completes, the IOP_UNPIN() call will remove the item from the AIL if it is the last reference to the item. Hence we generally do not get unpinned, stale buffers in the AIL because the last reference to the stale buffer is usually the transaction that marked it stale.... > - If we currently reproduce the following sequence: > > A xfsaild > stale > (!pinned) ==> trylock() > pinned > (!trylock && pinned && stale) > ==> xfs_log_force() (boom) > > ... what prevents the following sequence from occurring sometime in the > future or with some alternate high-level sequence of events? > > A xfsaild > locked > (!pinned && !stale) ==> trylock() > pinned > stale > (!trylock && pinned && stale) > ==> xfs_log_force() You can't get that order and trigger the race. If the item is pinned before it is marked stale, that means we will always see pin count because it was pinned by a previous transaction commit. Pinning occurs on the commit of the first transaction commit that inserts the item into the CIL. It doesn't get unpinned until the CIL is checkpointed and inserted into the AIL. Hence the order of operations that marks an uncommitted buffer stale should always be lock->stale->pinned->unlock. However, if the buffer has been previously modified and is in the CIL, you'll see: lock->pinned->CIL insert->unlock.....->lock->stale->unlock If the buffer was modified a while back and the CIL is committed, the buffer will be in the AIL but not the CIL. If the buffer was modified a while back, and then again recently, it will be in both pinned in the CIL and the AIL. Neither of these cases can trigger this problem because the pinned check will fire reliably. Hence the specific case here is the buffer has been previously modified, the CIL committed so it's in the AIL, and we are marking the buffer stale as the first modification of the buffer after it was added to the AIL. IOWs, the order that is of concern is this while the item is in the AIL: lock->stale->pin->CIL insert->unlock So in terms of the xfsaild racing and causing problems, the only case it will occur in is this: Thread 1 xfsaild lock xfs_buf_item_push .... not pinned stale trylock <delay> commit pin if(pinned && stale) <splat> unlock So what I was asking is whether we can do checks in the order of: smb_mb(); if (stale) return if (pinned) return What I've just realised is that we really don't care if we race in xfs_buf_item_push(). The race that matters is in xfs_buf_trylock() and at that point it is too late to avoid it. So I think your original patch is on the right path but having the xfsaild handle the log force gets rid of most of the nasty cruft it had.... ie. we're going to have to tell xfs_buf_trylock() that we should not do a log force if the lock fails, and return XFS_ITEM_PINNED rather than XFS_ITEM_LOCKED in xfs_buf_item_push().... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs