Re: [PATCH v2 1/7] Adding support to freeze and unfreeze a journal

Surbhi Palande <csurbhi@xxxxxxxxx> · Tue, 10 Jan 2012 22:06:57 -0800

(Resending one of the last cases, as the two task scenario got mixed! )

> Now following is the case where the writing
> process/jbd2_journal_start() calls read_lock() *before*
> jbd2_journal_lock_updates() are called by the freezing process.
>

        CPU1						                              CPU2
    Task1 (write operation)				                               Task2
tx	   read_lock(&journal->j_state_lock)		
tx+1							               jbd2_journal_lock_updates() (will be stuck
at the next instance...)
tx+3 	if(journal->j_barrier_count)			
write_lock(journal->j_state_lock) /* stuck here till Task1
relinquishes the lock*/
tx+4  handle started and associated with
        running transaction.
tx+5   read_unlock(&journal->j_state_lock)
tx+6							               jbd2_journal_lock_updates() succeeds
tx+7							               jbd2_journal_flush() /* This shall flush the
running transactions */
							                 /* There are no outstanding transcations. No
new ext4_journal_start() will succeed by this point */
> --------------------------------
> Thus:
>
> Though a racy transaction can really be started after SB_FREEZE_WRITE,
> either that transaction will be flushed by jbd2_journal_flush or it
> cannot be started at all because of the barrier count.
>
> So, I do not understand why the journaled writes should really have a
> race window after this patch is applied?
>

Regards,
Surbhi

On Tue, Jan 10, 2012 at 9:38 PM, Surbhi Palande <csurbhi@xxxxxxxxx> wrote:
> On second thoughts, I fail to see why there is still a race window
> after this patch.
>
> Here are the reasons why i fail to see how the data can be dirtied
> when all the operations involve a journal:
>
> ----------
> So here is the problem that we see
>        CPU1                                                     CPU2
>       Task1 (write operation)                                    Task2
> ---------------------------------------------------------------------------------------
> t1      ext4_journal_start()
> t2        ext4_journal_start_sb()
> t3          vfs_check_frozen                            sb->frozen=SB_FREEZE_WRITE
> t4              jbd2_journal_start()                    /* hence forth all processes calling
> vfs_check_frozen will wait */
>
> Now, our aim is to stop Task1 from dirtying the page cache ie in
> starting this transaction. However if it is successful in starting
> this transaction, then we want to make sure that this transaction is
> flushed out.
> Correct?
>
> (with the journal freeze patch applied)
> * Task1 is now executing the code of jbd2_journal_start(). Task2 is
> the freezing process.
>
>        CPU1                                            CPU2
>       Task1 (write operation)                          Task2
> t4      jbd2_journal_start()
> ...
> tx         read_lock(&journal->j_state_lock)
>
> Now two things can happen at this point with respect to Task2:
> a) either journal->j_flags is set to JBD2_FROZEN already
> b) or it is not set.
>
>
> Lets look at both the cases:
> A) When journal->j_flags is set to JBD2_FROZEN already then
> jbd2_journal_start() will get stuck on the waitqueue as long as  the
> journal->j_flags is JBD2_FROZEN. So we cannot create dirty data in
> this case.
>
> B) When journal->j_flags is not set to JBD2_ FROZEN then two more
> things could happen:
> Task2 has already finished the call to jbd2_journal_flush() by the
> time Task1 calls read_lock(). So now no new task (T1) should create
> any new dirty data as that will not get flushed out. So we really want
> to stop the jbd2_journal_start() from succeeding.
>
>
>        CPU1                                                          CPU2
>       Task1 (write operation)                                         Task2
> ----------------------------------------------------------------------------------------------
> tx         read_lock(&journal->j_state_lock)
> jbd2_journal_lock_updates() /*journal->j_barrier_count incremented -
> so non zero ! */
> tx+1                                                                         jbd2_journal_flush()
> tx+2                                                                         write_lock(&journal->j_state_lock);
>                                                                             /* till the read_lock is relinquished by
> task1 , we are stuck */
> tx+3    if(journal->j_barrier_count)
>                read_unlock(&journal->j_state_lock);
>
>       /* so we can set the journal->j_flags now as write_lock()
> succeeds here */
> tx+4     goto repeat
>
> The above case is where the writing process/jbd2_journal_start() calls
> read_lock() *after* jbd2_journal_lock_updates() are called by the
> freezing process and hence is protected by the j_barrier_count check.
> This transaction can definitely not start!
>
> Now following is the case where the writing
> process/jbd2_journal_start() calls read_lock() *before*
> jbd2_journal_lock_updates() are called by the freezing process.
>
> However, if Task1 already finished starting the transaction as follows:
>        CPU1                                                                                         CPU2
>       Task1 (write operation)
>         Task2
> -------------------------------------------------------------------------------------------------------------------------------------------------------------
> tx         read_lock(&journal->j_state_lock)
> tx+1
> jbd2_journal_lock_updates() (will be stuck at the next instance...)
> tx+3        if(journal->j_barrier_count)
> write_lock(journal->j_state_lock)
>            /* journal->j_barrier_count = 0, so we proceed here*/
>               /* stuck here till Task1 relinquishes the lock*/
> tx+4     read_unlock(&journal->j_state_lock);
> tx+5 now start_this_handle() returns successfully
>      and associates this handle with the running
>        transaction.
> tx+6
> jbd2_journal_lock_updates() succeeds
> tx+7
> jbd2_journal_flush() /* This shall flush the running transactions */
>                                                                                               /* There are no
> outstanding transcations. No new ext4_journal_start() will succeed by
> this                                                                    point*/
>
> --------------------------------
> Thus:
>
> Though a racy transaction can really be started after SB_FREEZE_WRITE,
> either that transaction will be flushed by jbd2_journal_flush or it
> cannot be started at all because of the barrier count.
>
> So, I do not understand why the journaled writes should really have a
> race window after this patch is applied?
>
> Thanks!
>
> Regards,
> Surbhi
>
>
>
>
>
>
>
> On Tue, Jan 10, 2012 at 4:13 PM, Surbhi Palande <csurbhi@xxxxxxxxx> wrote:
>> Hi Jan,
>>
>>>>
>>>>
>>>> If all the write operations were journaled, then this patch would not allow
>>>> ext4 filesystem to have any dirty data after its frozen.
>>>> (as journal_start() would block).
>>>>
>>>>  I think the only one candidate that creates dirty data without calling
>>>> ext4_journal_start() is mmapped?
>>>  No, the problem is in any write path. The problem is with operations
>>> that happen during the phase when s_frozen == SB_FREEZE_WRITE. These
>>> operations dirty the filesystem but running sync may easily miss them.
>>> During this phase journal is not frozen so that does not help you in any
>>> way.
>>>
>>>                                                                Honza
>>
>> Ok! No new transaction can really start after the journal is frozen.
>> But we can have dirty data after SB_FREEZE_WRITE and before
>> SB_FREEZE_TRANS.
>> I agree with you. However, can this be fixed by adding a
>> sync_filesystem() in freeze_super() after the sb->s_op->freeze_fs() is
>> over?
>>
>>
>> So then essentially, when freeze_super() returns, the page cache is clean?
>>
>> I do definitely agree that the fix is to add a lock for mutual
>> exclusion between freeze filesystem and writes to a frozen filesystem.
>>
>> Thanks!
>>
>> Regards,
>> Surbhi.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html