Re: [RFC PATCH 1/3] VFS: Fix s_umount thaw/write deadlock

Jan Kara <jack@xxxxxxx> · Thu, 15 Sep 2011 18:22:12 +0200

On Wed 14-09-11 16:53:38, Valerie Aurora wrote:
> On Wed, Sep 14, 2011 at 7:00 AM, Jan Kara <jack@xxxxxxx> wrote:
> > On Mon 12-09-11 19:57:11, Valerie Aurora wrote:
> >> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> >> index 04cf3b9..d1dca03 100644
> >> --- a/fs/fs-writeback.c
> >> +++ b/fs/fs-writeback.c
> >> @@ -537,6 +537,9 @@ static long writeback_sb_inodes(struct super_block *sb,
> >>       long write_chunk;
> >>       long wrote = 0;  /* count both pages and inodes */
> >>
> >> +     if (vfs_is_frozen(sb))
> >> +             return 0;
> >> +
> >  Umm, maybe we could make this more robust by skipping the superblock in
> > __writeback_inodes_wb() and just explicitely stopping the writeback when
> > work->sb is set (i.e. writeback is required only for frozen sb) in
> > wb_writeback()?
> 
> Sorry, I don't quite understand what the goal is here?  I'm happy to
> make the change, just want to make sure I'm accomplishing what you
> want.
  My worry is, that when we just bail out from writeback_sb_inodes(), we
leave dirty inodes on b_io list. That is a unique behavior in writeback
code since all other places either put inode to b_more_io list for a quick
retry or redirty the inode to retry it later.

  Emptyness of b_io list is used for example to detect whether we should
queue more inodes to b_io list and also busylooping logic kind of doesn't
count with inodes staying at b_io list. So although I don't see an
immediate problem with your solution it does add a new special case.

  After some more thought I think the cleanest would be to just call
redirty_tail() if an inode is on frozen superblock in the beginning of the
loop in writeback_sb_inodes(). It won't be as efficient but much more in
line how the rest of the writeback code works.

> >>       while (!list_empty(&wb->b_io)) {
> >>               struct inode *inode = wb_inode(wb->b_io.prev);
> >>
> >> @@ -1238,39 +1241,43 @@ EXPORT_SYMBOL(writeback_inodes_sb);
> >>   * writeback_inodes_sb_if_idle       -       start writeback if none underway
> >>   * @sb: the superblock
> >>   *
> >> - * Invoke writeback_inodes_sb if no writeback is currently underway.
> >> - * Returns 1 if writeback was started, 0 if not.
> >> + * Invoke writeback_inodes_sb if no writeback is currently underway
> >> + * and no one else holds the s_umount lock.  Returns 1 if writeback
> >> + * was started, 0 if not.
> >>   */
> >>  int writeback_inodes_sb_if_idle(struct super_block *sb)
> >>  {
> >>       if (!writeback_in_progress(sb->s_bdi)) {
> >> -             down_read(&sb->s_umount);
> >> -             writeback_inodes_sb(sb);
> >> -             up_read(&sb->s_umount);
> >> -             return 1;
> >> -     } else
> >> -             return 0;
> >> +             if (down_read_trylock(&sb->s_umount)) {
> >  What's exactly the deadlock trylock protects from here? Or is it just an
> > optimization?
> 
> The trylock is an optimization Dave Chinner suggested.  The first
> version I wrote acquired the lock and then checked vfs_is_frozen().
> 
> This function and the similar following one each have one caller,
> btrfs in one case and ext4 in the other, and they are both trying to
> get more writes to go out to disk in the hope of freeing up disk
> space.
  Ah right. Maybe a short comment that it's just an optimization for
opportunictic writeout would be nice.

> >> +                     writeback_inodes_sb(sb);
> >> +                     up_read(&sb->s_umount);
> >> +                     return 1;
> >> +             }
> >> +     }
> >> +     return 0;

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html