Re: Help on Implementation of EXT3 type Ordered Mode in EXT4

Jan Kara <jack@xxxxxxx> · Tue, 16 Feb 2010 14:10:39 +0100



On Tue 16-02-10 15:40:22, Kailas Joshi wrote:
> On 15 February 2010 20:30, Jan Kara <jack@xxxxxxx> wrote:
> > On Sat 13-02-10 14:13:17, Kailas Joshi wrote:
> >> On 13 February 2010 01:37,  <tytso@xxxxxxx> wrote:
> >> > On Fri, Feb 12, 2010 at 08:52:15AM +0530, Kailas Joshi wrote:
> >> >> Sorry, I didn't understand why processes need to be suspended.
> >> >> In my scheme, I am issuing magic handle only after locking the current
> >> >> transaction.  AFAIK after the transaction is locked, it can receive the
> >> >> block journaling requests for already created handles(in our case, for
> >> >> already reserved journal space), and the new concurrent requests for
> >> >> journal_start() will go to the new current transaction. Since, the
> >> >> credits for locked transaction are fixed (by means of early
> >> >> reservations) we can know whether journal has enough space for the new
> >> >> journal_start(). So, as long as journal has enough space available,
> >> >> new processes need now be stalled.
> >> >
> >> > But while you are modifying blocks that need to go into the journal
> >> > via the locked (old) transaction, it's not safe to start a new
> >> > transaction and start issuing handles against the new transaction.
> >> >
> >> > Just to give one example, suppose we need to update the extent
> >> > allocation tree for an inode in the locked/committing transaction as
> >> > the delayed allocation blocks are being resolved --- and in another
> >> > process, that inode is getting truncated or unlinked, which also needs
> >> > to modify the extent allocation tree?  Hilarty ensues, unless you use
> >> > a block all attempts to create a new handle (practically speaking, by
> >> > blocking all attempts to start a new transaction), until this new
> >> > delayed allocation resolution phase which you have proposed is
> >> > complete.
> >> Okay. So, basically process stalling is unavoidable as we cannot
> >> modify a buffer data in past transaction after it has been modified in
> >> current transaction.
> >> Can we restrict the scope for this blocking? Blocking on
> >> journal_start() will block all processes even though they are
> >> operating on mutually exclusive sets of metadata buffers. Can we
> >> restrict this blocking to allocation/deallocation paths by blocking in
> >> get_write_access() on specific cases(some condition on buffer)? This
> >> way, since all files will use commit-time allocation, very few(sync
> >> and direct-io mode) file operations will be stalled.
> >  I doubt blocking at buffer-level would be enough. I think that the
> > journalling layer just does not have enough information for such decisions.
> > It could be feasible to block on per-inode basis but you'd still have to
> > give a good thought to modification of filesystem global structures like
> > bitmaps, superblock, or inode blocks.
> Okay. So, blocking at buffer level will not be easy as global
> structures shared among inodes will need modifications(for example,
> changing access time for a file in inode block).
  Yes.

> One last doubt, while looking at the code, I saw that journal_start()
> always stalls all file operations while currently running transaction
> is in LOCKED state. Only when the current transaction moves to FLUSH,
> the new transaction is created and the stalled operations continue. Is
> this interpretation correct?
  Yes, it is correct.

> If yes, why this stalling does not have significant negative impact on
> performance of file operations? Also, if it does not have, will
> stalling for delayed block allocation really have such significant
> negative impact?
  Actually, stalling on a transaction in LOCKED state does have a negative
impact on the filesystem performance. But it's hard to avoid it. The
transaction is in LOCKED state while we've decided it needs a commit but
there are still tasks which have handle to it and are adding new metadata
buffers to it. So this transaction is effectively still running and we
cannot start a next transaction because then we'd have two running
transactions and the journalling logic isn't able to handle that.

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html