(Resending one of the last cases, as the two task scenario got mixed! ) > Now following is the case where the writing > process/jbd2_journal_start() calls read_lock() *before* > jbd2_journal_lock_updates() are called by the freezing process. > CPU1 CPU2 Task1 (write operation) Task2 tx read_lock(&journal->j_state_lock) tx+1 jbd2_journal_lock_updates() (will be stuck at the next instance...) tx+3 if(journal->j_barrier_count) write_lock(journal->j_state_lock) /* stuck here till Task1 relinquishes the lock*/ tx+4 handle started and associated with running transaction. tx+5 read_unlock(&journal->j_state_lock) tx+6 jbd2_journal_lock_updates() succeeds tx+7 jbd2_journal_flush() /* This shall flush the running transactions */ /* There are no outstanding transcations. No new ext4_journal_start() will succeed by this point */ > -------------------------------- > Thus: > > Though a racy transaction can really be started after SB_FREEZE_WRITE, > either that transaction will be flushed by jbd2_journal_flush or it > cannot be started at all because of the barrier count. > > So, I do not understand why the journaled writes should really have a > race window after this patch is applied? > Regards, Surbhi On Tue, Jan 10, 2012 at 9:38 PM, Surbhi Palande <csurbhi@xxxxxxxxx> wrote: > On second thoughts, I fail to see why there is still a race window > after this patch. > > Here are the reasons why i fail to see how the data can be dirtied > when all the operations involve a journal: > > ---------- > So here is the problem that we see > CPU1 CPU2 > Task1 (write operation) Task2 > --------------------------------------------------------------------------------------- > t1 ext4_journal_start() > t2 ext4_journal_start_sb() > t3 vfs_check_frozen sb->frozen=SB_FREEZE_WRITE > t4 jbd2_journal_start() /* hence forth all processes calling > vfs_check_frozen will wait */ > > Now, our aim is to stop Task1 from dirtying the page cache ie in > starting this transaction. However if it is successful in starting > this transaction, then we want to make sure that this transaction is > flushed out. > Correct? > > (with the journal freeze patch applied) > * Task1 is now executing the code of jbd2_journal_start(). Task2 is > the freezing process. > > CPU1 CPU2 > Task1 (write operation) Task2 > t4 jbd2_journal_start() > ... > tx read_lock(&journal->j_state_lock) > > Now two things can happen at this point with respect to Task2: > a) either journal->j_flags is set to JBD2_FROZEN already > b) or it is not set. > > > Lets look at both the cases: > A) When journal->j_flags is set to JBD2_FROZEN already then > jbd2_journal_start() will get stuck on the waitqueue as long as the > journal->j_flags is JBD2_FROZEN. So we cannot create dirty data in > this case. > > B) When journal->j_flags is not set to JBD2_ FROZEN then two more > things could happen: > Task2 has already finished the call to jbd2_journal_flush() by the > time Task1 calls read_lock(). So now no new task (T1) should create > any new dirty data as that will not get flushed out. So we really want > to stop the jbd2_journal_start() from succeeding. > > > CPU1 CPU2 > Task1 (write operation) Task2 > ---------------------------------------------------------------------------------------------- > tx read_lock(&journal->j_state_lock) > jbd2_journal_lock_updates() /*journal->j_barrier_count incremented - > so non zero ! */ > tx+1 jbd2_journal_flush() > tx+2 write_lock(&journal->j_state_lock); > /* till the read_lock is relinquished by > task1 , we are stuck */ > tx+3 if(journal->j_barrier_count) > read_unlock(&journal->j_state_lock); > > /* so we can set the journal->j_flags now as write_lock() > succeeds here */ > tx+4 goto repeat > > The above case is where the writing process/jbd2_journal_start() calls > read_lock() *after* jbd2_journal_lock_updates() are called by the > freezing process and hence is protected by the j_barrier_count check. > This transaction can definitely not start! > > Now following is the case where the writing > process/jbd2_journal_start() calls read_lock() *before* > jbd2_journal_lock_updates() are called by the freezing process. > > However, if Task1 already finished starting the transaction as follows: > CPU1 CPU2 > Task1 (write operation) > Task2 > ------------------------------------------------------------------------------------------------------------------------------------------------------------- > tx read_lock(&journal->j_state_lock) > tx+1 > jbd2_journal_lock_updates() (will be stuck at the next instance...) > tx+3 if(journal->j_barrier_count) > write_lock(journal->j_state_lock) > /* journal->j_barrier_count = 0, so we proceed here*/ > /* stuck here till Task1 relinquishes the lock*/ > tx+4 read_unlock(&journal->j_state_lock); > tx+5 now start_this_handle() returns successfully > and associates this handle with the running > transaction. > tx+6 > jbd2_journal_lock_updates() succeeds > tx+7 > jbd2_journal_flush() /* This shall flush the running transactions */ > /* There are no > outstanding transcations. No new ext4_journal_start() will succeed by > this point*/ > > -------------------------------- > Thus: > > Though a racy transaction can really be started after SB_FREEZE_WRITE, > either that transaction will be flushed by jbd2_journal_flush or it > cannot be started at all because of the barrier count. > > So, I do not understand why the journaled writes should really have a > race window after this patch is applied? > > Thanks! > > Regards, > Surbhi > > > > > > > > On Tue, Jan 10, 2012 at 4:13 PM, Surbhi Palande <csurbhi@xxxxxxxxx> wrote: >> Hi Jan, >> >>>> >>>> >>>> If all the write operations were journaled, then this patch would not allow >>>> ext4 filesystem to have any dirty data after its frozen. >>>> (as journal_start() would block). >>>> >>>> I think the only one candidate that creates dirty data without calling >>>> ext4_journal_start() is mmapped? >>> No, the problem is in any write path. The problem is with operations >>> that happen during the phase when s_frozen == SB_FREEZE_WRITE. These >>> operations dirty the filesystem but running sync may easily miss them. >>> During this phase journal is not frozen so that does not help you in any >>> way. >>> >>> Honza >> >> Ok! No new transaction can really start after the journal is frozen. >> But we can have dirty data after SB_FREEZE_WRITE and before >> SB_FREEZE_TRANS. >> I agree with you. However, can this be fixed by adding a >> sync_filesystem() in freeze_super() after the sb->s_op->freeze_fs() is >> over? >> >> >> So then essentially, when freeze_super() returns, the page cache is clean? >> >> I do definitely agree that the fix is to add a lock for mutual >> exclusion between freeze filesystem and writes to a frozen filesystem. >> >> Thanks! >> >> Regards, >> Surbhi. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html