Question concerning the EXT3 Journaling code

Duane Cloud <cloud@xxxxxxxxxx> · Mon, 14 Aug 2006 14:08:01 -0500

Hello,

I've attached the output from "sh scripts/ver_linux" per the bug 
reporting guidelines, and a diff of the file fs/jbd/transaction.c.

At this point, I'm trying to hunt down why some system threads, which 
are executing the Lustre file system code, are taking an unexpectedly 
long time executing various ext3 file system functions.  I added some 
debug code to these system threads in order to find out where they are 
spending their time in the hopes that I can identify a place where they 
may be experiencing unexpected delays.

This debugging lead me to look at the code in start_this_handle(), 
contained in file fs/jbd/transaction.c, and I have a question concerning 
the wake_up() logic for the thread which may go to sleep on 
<j_wait_transaction_locked>.

The thread will sleep as long as <j_barrier_count> is non-zero:

        repeat:
                spin_lock(&journal->j_state_lock);
                ...
                if (journal->j_barrier_count) {
                        spin_unlock(&journal->j_state_lock);
                        wait_event(...);
                        goto repeat;
                }
                ...
                if (...) {
                        prepare_to_wait(...);
                        spin_unlock(&journal->j_state_lock);
                        schedule();
                        finish_wait(...);
                        goto repeat;
                }
                ...

The last "if (...)" represents 2 additional conditions which can cause 
the thread to go to sleep in start_this_handle(), and loop back to the 
"repeat" label.

In looking at how the wake_up() occurs, there exists the following 
section of code:

      transaction->t_updates--;
      if (!transaction->t_updates) {
           wake_up(&journal->j_wait_updates);
           if (journal->j_barrier_count)
                  wake_up(&journal->j_wait_transaction_locked);
      }

It would seem to me that this wake_up() will end up being a no-op, as 
the thread being woken up will go back to sleep since <j_barrier_count> 
is non-zero.  I'm really not familiar with the journaling code, so I was 
hoping to pass on my thoughts in order to get some feedback.

What I did was change this wakeup to be unconditional, and it appears to 
have had a positive impact on the delays the system threads, I've been 
monitoring, have been experiencing.

The other change I made, although it shouldn't affect the non-SMP system 
my kernel is running on, is to change the spin_unlock()/wait_event() to 
the prepare_to_wait()/spin_unlock()/schedule()/finish_wait() sequence, 
and move the wake_up() in journal_unlock_updates() from after the 
spin_unlock() to just before the lock is given up...I'm thinking the 
thread going to sleep needs to be put on the wait queue before the 
wake_up() occurs, or it could miss this wake_up().

All in all, I'm experiencing an unexpected delay in executing some ext3 
file system calls, and would like to get your thoughts as to whether the 
concern I have above, with a wake_up() possibly being a no-op, could 
explain these execution delays.

I appreciate your help tremendously.  If you folks aren't the ones I 
should be talking to, please point me in the correct direction.  I took 
this e-mail address from the MAINTAINERS file located in the kernel 
source tree.

--

Thank you,

Duane Cloud
Systems Programmer

Network Computing Services, Inc.
Army High Performance Computing Research Center (AHPCRC)

cloud@xxxxxxxxxx, 612-337-3407 Desk
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.
 
Linux mh01 2.6.5-7.252-ss #7 Sun Jun 18 04:25:58 PDT 2006 x86_64 x86_64 x86_64 GNU/Linux
 
Gnu C                  3.3.3
Gnu make               3.80
binutils               2.15.90.0.1.1
util-linux             2.12
mount                  2.12
module-init-tools      3.0-pre10
e2fsprogs              1.38.cfs2
jfsutils               1.1.7
xfsprogs               2.6.25
quota-tools            3.11.
PPP                    2.4.2
nfs-utils              1.0.6
Linux C Library        x  1 root root 1397474 Jun  3  2005 /lib64/tls/libc.so.6
Dynamic linker (ldd)   2.3.3
Linux C++ Library      5.0.6
Procps                 3.2.5
Net-tools              1.60
Kbd                    1.12
Sh-utils               5.2.1
Modules Loaded         osc llite lov mdc kptllnd ptlrpc lnet obdclass lvfs libcfs e1000 rca ippo portals
host> diff -puN jbd/transaction.c /usr/src/linux-2.6.5-7.252/fs/jbd/transaction.c

--- jbd/transaction.c	2006-08-14 12:25:43.424153474 -0500
+++ /usr/src/linux-2.6.5-7.252/fs/jbd/transaction.c	2006-07-30 10:29:27.000000000 -0500
@@ -125,13 +125,9 @@ repeat_locked:
 
 	/* Wait on the journal's transaction barrier if necessary */
 	if (journal->j_barrier_count) {
-		DEFINE_WAIT(wait);
-
-		prepare_to_wait(&journal->j_wait_transaction_locked,
-					&wait, TASK_UNINTERRUPTIBLE);
 		spin_unlock(&journal->j_state_lock);
-		schedule();
-		finish_wait(&journal->j_wait_transaction_locked, &wait);
+		wait_event(journal->j_wait_transaction_locked,
+				journal->j_barrier_count == 0);
 		goto repeat;
 	}
 
@@ -480,8 +476,8 @@ void journal_unlock_updates (journal_t *
 	up(&journal->j_barrier);
 	spin_lock(&journal->j_state_lock);
 	--journal->j_barrier_count;
-	wake_up(&journal->j_wait_transaction_locked);
 	spin_unlock(&journal->j_state_lock);
+	wake_up(&journal->j_wait_transaction_locked);
 }
 
 /*
@@ -1368,7 +1364,8 @@ int journal_stop(handle_t *handle)
 	transaction->t_updates--;
 	if (!transaction->t_updates) {
 		wake_up(&journal->j_wait_updates);
-		wake_up(&journal->j_wait_transaction_locked);
+		if (journal->j_barrier_count)
+			wake_up(&journal->j_wait_transaction_locked);
 	}
 
 	/* Move callbacks from the handle to the transaction. */
_______________________________________________
Ext3-users mailing list
Ext3-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/ext3-users