On 08/03/2010 12:01 PM, Theodore Ts'o wrote:
The first patch in this patch series hasn't changed since when I had
last posted it, but I'm including it again for the benefit of the folks
on ocfs2-dev.
Thanks to some work done by Eric Whitney, when he accidentally ran the
command "mkfs.ext4 -t xfs", and created a ext4 file system without a
journal, it appears that main scalability bottleneck for ext4 is in the
jbd2 layer. In fact, his testing on a 48-core system shows that on some
workloads, ext4 is roughly comparable with XFS!
The lockstat results indicate that the main bottlenecks are in the
j_state_lock and t_handle_lock, especially in start_this_handle() in
fs/jbd2/transaction.c. A previous patch, which removed an unneeded
grabbing of j_state_lock jbd2_journal_stop() relieved pressure on that
lock and was noted to make a significant difference for dbench on a
kernel with CONFIG_PREEMPT_RT enabled, as well as on a 48-core AMD
system from HP. This patch is already in 2.6.35, and the benchmark
results can be found here: http://free.linux.hp.com/~enw/ext4/2.6.34/
This patch series removes all exclusive spinlocks when starting and
stopping jbd2 handles, which should improve things even more. Since
OCFS2 uses the jbd2 layer, and the second patch in this patch series
touches ocfs2 a wee bit, I'd appreciate it if you could take a look and
let me know what you think. Hopefully, this should also improve OCFS2's
scalability.
Best regards,
- Ted
Theodore Ts'o (3):
jbd2: Use atomic variables to avoid taking t_handle_lock in
jbd2_journal_stop
jbd2: Change j_state_lock to be a rwlock_t
jbd2: Remove t_handle_lock from start_this_handle()
fs/ext4/inode.c | 4 +-
fs/ext4/super.c | 4 +-
fs/jbd2/checkpoint.c | 18 +++---
fs/jbd2/commit.c | 42 +++++++-------
fs/jbd2/journal.c | 94 +++++++++++++++----------------
fs/jbd2/transaction.c | 149 ++++++++++++++++++++++++++++---------------------
fs/ocfs2/journal.c | 4 +-
include/linux/jbd2.h | 12 ++--
8 files changed, 174 insertions(+), 153 deletions(-)
My 48 core test results for these patches as applied to 2.6.35 can be
found at:
http://free.linux.hp.com/~enw/ext4/2.6.35
Both the Boxacle large_file_creates and random_writes workloads improved
significantly and consistently with these patches, and apparently in the
single threaded case as well as at increased scale.
The graphs at the URL show one instance of several runs I made to
establish repeatability.
I've also taken unmodified 2.6.35 ext4, ext4 without a journal, and xfs
data for comparison. In addition, I've collected lock stats and more
detailed performance data for reference.
Thanks, Ted!
Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html