Re: [PATCH v2 1/7] Adding support to freeze and unfreeze a journal

Surbhi Palande <csurbhi@xxxxxxxxx> · Tue, 10 Jan 2012 21:38:29 -0800

On second thoughts, I fail to see why there is still a race window
after this patch.

Here are the reasons why i fail to see how the data can be dirtied
when all the operations involve a journal:

----------
So here is the problem that we see
	CPU1						         CPU2
       Task1 (write operation)				          Task2
---------------------------------------------------------------------------------------
t1 	ext4_journal_start()
t2	  ext4_journal_start_sb()
t3	    vfs_check_frozen				sb->frozen=SB_FREEZE_WRITE
t4		jbd2_journal_start()			/* hence forth all processes calling
vfs_check_frozen will wait */

Now, our aim is to stop Task1 from dirtying the page cache ie in
starting this transaction. However if it is successful in starting
this transaction, then we want to make sure that this transaction is
flushed out.
Correct?

(with the journal freeze patch applied)
* Task1 is now executing the code of jbd2_journal_start(). Task2 is
the freezing process.

	CPU1						CPU2
       Task1 (write operation)				Task2
t4 	jbd2_journal_start()
...
tx	   read_lock(&journal->j_state_lock)		

Now two things can happen at this point with respect to Task2:
a) either journal->j_flags is set to JBD2_FROZEN already
b) or it is not set.

Lets look at both the cases:
A) When journal->j_flags is set to JBD2_FROZEN already then
jbd2_journal_start() will get stuck on the waitqueue as long as  the
journal->j_flags is JBD2_FROZEN. So we cannot create dirty data in
this case.

B) When journal->j_flags is not set to JBD2_ FROZEN then two more
things could happen:
Task2 has already finished the call to jbd2_journal_flush() by the
time Task1 calls read_lock(). So now no new task (T1) should create
any new dirty data as that will not get flushed out. So we really want
to stop the jbd2_journal_start() from succeeding.

	CPU1						              CPU2
       Task1 (write operation)				               Task2
----------------------------------------------------------------------------------------------
tx	   read_lock(&journal->j_state_lock)		
jbd2_journal_lock_updates() /*journal->j_barrier_count incremented -
so non zero ! */
tx+1							                     jbd2_journal_flush()
tx+2	   						                     write_lock(&journal->j_state_lock);
							                     /* till the read_lock is relinquished by
task1 , we are stuck */
tx+3 	if(journal->j_barrier_count)
		read_unlock(&journal->j_state_lock);

       /* so we can set the journal->j_flags now as write_lock()
succeeds here */
tx+4     goto repeat

The above case is where the writing process/jbd2_journal_start() calls
read_lock() *after* jbd2_journal_lock_updates() are called by the
freezing process and hence is protected by the j_barrier_count check.
This transaction can definitely not start!

Now following is the case where the writing
process/jbd2_journal_start() calls read_lock() *before*
jbd2_journal_lock_updates() are called by the freezing process.

However, if Task1 already finished starting the transaction as follows:
	CPU1						                                             CPU2
       Task1 (write operation)				
         Task2
-------------------------------------------------------------------------------------------------------------------------------------------------------------
tx	   read_lock(&journal->j_state_lock)		
tx+1							
jbd2_journal_lock_updates() (will be stuck at the next instance...)
tx+3 	    if(journal->j_barrier_count)			
write_lock(journal->j_state_lock)
            /* journal->j_barrier_count = 0, so we proceed here*/
               /* stuck here till Task1 relinquishes the lock*/
tx+4     read_unlock(&journal->j_state_lock);
tx+5 now start_this_handle() returns successfully
      and associates this handle with the running
	transaction.
tx+6							
jbd2_journal_lock_updates() succeeds
tx+7							
jbd2_journal_flush() /* This shall flush the running transactions */
	                            						               /* There are no
outstanding transcations. No new ext4_journal_start() will succeed by
this 									point*/

--------------------------------
Thus:

Though a racy transaction can really be started after SB_FREEZE_WRITE,
either that transaction will be flushed by jbd2_journal_flush or it
cannot be started at all because of the barrier count.

So, I do not understand why the journaled writes should really have a
race window after this patch is applied?

Thanks!

Regards,
Surbhi

On Tue, Jan 10, 2012 at 4:13 PM, Surbhi Palande <csurbhi@xxxxxxxxx> wrote:
> Hi Jan,
>
>>>
>>>
>>> If all the write operations were journaled, then this patch would not allow
>>> ext4 filesystem to have any dirty data after its frozen.
>>> (as journal_start() would block).
>>>
>>>  I think the only one candidate that creates dirty data without calling
>>> ext4_journal_start() is mmapped?
>>  No, the problem is in any write path. The problem is with operations
>> that happen during the phase when s_frozen == SB_FREEZE_WRITE. These
>> operations dirty the filesystem but running sync may easily miss them.
>> During this phase journal is not frozen so that does not help you in any
>> way.
>>
>>                                                                Honza
>
> Ok! No new transaction can really start after the journal is frozen.
> But we can have dirty data after SB_FREEZE_WRITE and before
> SB_FREEZE_TRANS.
> I agree with you. However, can this be fixed by adding a
> sync_filesystem() in freeze_super() after the sb->s_op->freeze_fs() is
> over?
>
>
> So then essentially, when freeze_super() returns, the page cache is clean?
>
> I do definitely agree that the fix is to add a lock for mutual
> exclusion between freeze filesystem and writes to a frozen filesystem.
>
> Thanks!
>
> Regards,
> Surbhi.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html