On Thu, Apr 11, 2013 at 06:04:02PM +0100, Mel Gorman wrote: > > If we're stalling on lock_buffer(), that implies that buffer was being > > written, and for some reason it was taking a very long time to > > complete. > > > > Yes. > > > It might be worthwhile to put a timestamp in struct dm_crypt_io, and > > record the time when a particular I/O encryption/decryption is getting > > queued to the kcryptd workqueues, and when they finally squirt out. > > > > That somewhat assumes that dm_crypt was at fault which is not unreasonable > but I was skeptical as the workload on dm_crypt was opening a maildir > and mostly reads. Hmm... well, I've reviewed all of the places in the ext4 and jbd2 layer where we call lock_buffer(), and with one exception[1] we're not holding the the bh locked any longer than necessary. There are a few places where we grab a spinlock or two before we can do what we need to do and then release the lock'ed buffer head, but the only time we hold the bh locked for long periods of time is when we submit metadata blocks for I/O. [1] There is one exception in ext4_xattr_release_block() where I believe we should move the call to unlock_buffer(bh) before the call to ext4_free_blocks(), since we've already elevanted the bh count and ext4_free_blocks() does not need to have the bh locked. It's not related to any of the stalls you've repored, though, as near as I can tell (none of the stack traces include the ext4 xattr code, and this would only affect external extended attribute blocks). Could you code which checks the hold time of lock_buffer(), measuing from when the lock is successfully grabbed, to see if you can see if I missed some code path in ext4 or jbd2 where the bh is locked and then there is some call to some function which needs to block for some random reason? What I'd suggest is putting a timestamp in buffer_head structure, which is set by lock_buffer once it is successfully grabbed the lock, and then in unlock_buffer(), if it is held for more than a second or some such, to dump out the stack trace. Because at this point, either I'm missing something or I'm beginning to suspect that your hard drive (or maybe something the block layer?) is simply taking a long time to service an I/O request. Putting in this check should be able to very quickly determine what code path and/or which subsystem we should be focused upon. Thanks, - Ted -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>