[tytso@xxxxxxx: Re: [PATCH-v5 1/5] vfs: add support for a lazytime mount option]

"Ted Ts'o" <tytso@xxxxxxx> · Fri, 28 Nov 2014 13:37:13 -0500

Oops, sorry, responding from a different computer from what I normally
use due to Thanksgiving.

					- Ted

----- Forwarded message from Ted Ts'o <tytso@xxxxxxx> -----

Date: Fri, 28 Nov 2014 13:14:21 -0500
From: Ted Ts'o <tytso@xxxxxxx>
To: Jan Kara <jack@xxxxxxx>
Subject: Re: [PATCH-v5 1/5] vfs: add support for a lazytime mount option
User-Agent: Mutt/1.5.21 (2010-09-15)

On Fri, Nov 28, 2014 at 06:23:23PM +0100, Jan Kara wrote:
> Hum, when someone calls fsync() for an inode, you likely want to sync
> timestamps to disk even if everything else is clean. I think that doing
> what you did in last version:
> 	dirty = inode->i_state & I_DIRTY_INODE;
> 	inode->i_state &= ~I_DIRTY_INODE;
> 	spin_unlock(&inode->i_lock);
> 	if (dirty & I_DIRTY_TIME)
> 		mark_inode_dirty_sync(inode);
> looks better to me. IMO when someone calls __writeback_single_inode() we
> should write whatever we have...

Yes, but we also have to distinguish between what happens on an
fsync() versus what happens on a periodic writeback if I_DIRTY_PAGES
(but not I_DIRTY_SYNC or I_DIRTY_DATASYNC) is set.  So there is a
check in the fsync() code path to handle the concern you raised above.

> > +EXPORT_SYMBOL(inode_requeue_dirtytime);
> > +
>   This function has a single call site - update_time(). I'd prefer to
> handle this as a special case of __mark_inode_dirty() to have all the dirty
> queueing in one place...

It was originally also called by the ext4 optimization earlier; but
now that we've dropped it, sure, we can fold this back in.

> > +/*
> > + * Take all of the indoes on the dirty_time list, and mark them as
> > + * dirty, so they will be written out.
> > + */
> > +static void flush_sb_dirty_time(struct super_block *sb)
> > +{
> > +	struct bdi_writeback *wb = &sb->s_bdi->wb;
> > +	LIST_HEAD(tmp);
> > +
> > +	spin_lock(&wb->list_lock);
> > +	list_cut_position(&tmp, &wb->b_dirty_time, wb->b_dirty_time.prev);
> > +	while (!list_empty(&tmp)) {
> > +		struct inode *inode = wb_inode(tmp.prev);
> > +
> > +		list_del_init(&inode->i_wb_list);
> > +		spin_unlock(&wb->list_lock);
> > +		if (inode->i_state & I_DIRTY_TIME)
> > +			mark_inode_dirty_sync(inode);
> > +		spin_lock(&wb->list_lock);
> > +	}
> > +	spin_unlock(&wb->list_lock);
> > +}
> > +
>   This shouldn't be necessary when you somewhat tweak what you do in
> queue_io().

Right, I can key off of wb->reason == WB_REASON_SYNC.  Will fix next
time out.

					- Ted

> > -		if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock))
> > -			iput_final(inode);
> > +	if (!inode)
> > +		return;
> > +	BUG_ON(inode->i_state & I_CLEAR);
> > +retry:
> > +	if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock)) {
> > +		if (inode->i_nlink && (inode->i_state & I_DIRTY_TIME)) {
> > +			atomic_inc(&inode->i_count);
> > +			inode->i_state &= ~I_DIRTY_TIME;
> > +			spin_unlock(&inode->i_lock);
> > +			mark_inode_dirty_sync(inode);
> > +			goto retry;
> > +		}
> > +		iput_final(inode);
>   How about my suggestion from previous version to avoid the retry loop by
> checking I_DIRTY_TIME before atomic_dec_and_lock()?

I looked at doing "if ((atomic_read(&inode->i_count) == 1 && (i_state
& I_DIRTY_TIME)))" but that's vulerable to racing calls to iput(), and
if the inode then gets evicted, we could end up losing the timestamp
update.  For my use cases, I really couldn't care less, but for
correctness's sake, it seemed better to keep the retry loop and deal
with the test under the lock.

						- Ted

----- End forwarded message -----
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html