Re: background on the ext3 batching performance issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Josef Bacik wrote:
On Thursday 28 February 2008 10:05:11 am Josef Bacik wrote:
On Thursday 28 February 2008 7:09:17 am Ric Wheeler wrote:
At the LSF workshop, I mentioned that we have tripped across an
embarrassing performance issue in the jbd transaction code which is
clearly not tuned for low latency devices.

The short summary is that we can do say 800 10k files/sec in a
write/fsync/close loop with a single thread, but drop down to under 250
files/sec with 2 or more threads.

This is pretty easy to reproduce with any small file write synchronous
workload (i.e., fsync() each file before close).  We used my fs_mark
tool to reproduce.

The core of the issue is the call in the jbd transaction code call out
to schedule_timeout_uninterruptible(1) which causes us to sleep for 4ms:

        pid = current->pid;
        if (handle->h_sync && journal->j_last_sync_writer != pid) {
                journal->j_last_sync_writer = pid;
                do {
                        old_handle_count = transaction->t_handle_count;
                        schedule_timeout_uninterruptible(1);
                } while (old_handle_count !=
transaction->t_handle_count); }

This is quite topical to the concern we had with low latency devices in
general, but specifically things like SSD's.
Your testcase does in fact show a weakness in this optimization, but look
at the more likely case, where you have multiple writers on the same
filesystem rather than one guy doing write/fsync.  If we wait we could
potentially add quite a few more buffers to this transaction before
flushing it, rather than flushing a buffer or two at a time.  What would
you propose as a solution?


Forgive me, I said that badly, now that I've had my morning coffee let me try again. You are ping-ponging the j_last_sync_writer back and forth between the two threads, so you don't get the speedup you would get with one thread where we would just bypass the next sleep since we know we've got one thread doing write/sync. So this brings up the question, should we try and figure out if we have the situation where we have multiple threads doing write/sync and therefore exploiting the weakness in this optimization, and if we should, how would we do this properly? The only thing I can think to do is to track sync writers on a transaction, and if its more than one bypass this little snippet. In fact I think I'll go ahead and do that and see what fs_mark comes up with. Thank you,

Josef


Even worse, we go 4 times slower with 2 threads than we do with a single thread!

This code has tried several things in the past - reiserfs used to do a yield() at one point.

I am traveling until the weekend, but will be able to help with this when I get back in to my lab on Monday...

ric

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux