On Fri, Apr 09, 2010 at 05:49:54PM +0200, Andi Kleen wrote: > john stultz <johnstul@xxxxxxxxxx> writes: > > > > Further using lockstat I was able to isolate it the contention down to > > the journal j_state_lock, and then adding some lock owner tracking, I > > was able to see that the lock owners were almost always in > > start_this_handle, and jbd2_journal_stop when we saw contention (with > > the freq breakdown being about 55% in jbd2_journal_stop and 45% in > > start_this_handle). > > FWIW we've been also seeing this on larger systems without RT. > The journal locks are the number one contention in some workloads. > So it's not just a RT problem. Yeah, I'm very much aware of that. What worries me is that locking problems in the jbd2 layer could be very hard to debug, so we need to make sure we have some really good testing as we make any changes. Not taking the j_state_lock spinlock in jbd2_stop_lock() was relatively easy to prove to be safe, but I'm really worried about start_this_handle() the locking around that is going to be subtle, and it's not just the specific fields in the transaction and journal handle. And even with the jbd2_stop_lock() change, I'd really prefer some pretty exhaustive testing, including power fail testing, just to make sure we're in practice when/if we make more subtle or more invasive changes to the jbd2 layer... So I'm mot waving the red flag, but the yellow flag (as they would say in auto racing circles). Regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html