Re: [PATCH] ext4: journalled quota seed-up

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



  Hi,

On Thu 17-12-09 04:15:34, Dmitry Monakhov wrote:
>  Currently each quota modification result in write_dquot()
>  and later dquot_commit().  This means what each quota modification
>  function must wait for dqio_mutex. Which is *huge* performance
>  penalty on big SMP systems. ASAIU The core idea of this implementation
>  is to guarantee that each quota modification will be written to journal
>  atomically. But in fact this is not always true, because dquot may be
>  changed after dquot modification, but before it was committed to disk.
> 
>  | Task 1                           | Task 2                      |
>  | alloc_space(nr)                  |                             |
>  | ->spin_lock(dq_data_lock)        |                             |
>  | ->curspace += nr                 |                             |
>  | ->spin_unlock(dq_data_lock)      |                             |
>  | ->mark_dirty()                   | free_space(nr)              |
>  | -->write_dquot()                 | ->spin_lock(dq_data_lock)   |
>  | --->dquot_commit()               | ->curspace -= nr            |
>  | ---->commit_dqblk()              | ->spin_unlock(dq_data_lock) |
>  | ----->spin_lock(dq_data_lock)    |                             |
>  | ----->mem2disk_dqblk(ddq, dquot) | <<< Copy updated value      |
>  | ----->spin_unlock(dq_data_lock)  |                             |
>  | ----->quota_write()              |                             |
> Quota corruption not happens only because quota modification caller
> started journal already. And ext3/4 allow only one running quota
> at a time.
  It was actually designed to work this way. Ext3/4 handles for example
inodes like this as well... You're right that it might be good to document
this fact as in future someone might come with a filesystem with a
different scheme.

> Let's exploit this fact and avoid writing quota each time.
> Act similar to dirty_for_io in general write_back path in page-cache.
> If we have found what other task already started on copy and write the
> dquot then we skip actual quota_write stage. And let that task do the job.
> Side effect: Task which skip real quota_write() will not get an error
> (if any). But this is not big deal because:
>  1) Any error has global consequences (RO_remount, err_print, etc).
>  2) Real IO is differed till the journall_commit.
...
> +	if (!__dquot_mark_dquot_dirty(dquot))
> +		ret = dquot_commit(dquot);
  Hmm, but dquot_commit() clears the dirty bit just after acquiring
dqio_mutex so if I get it right, you save something only if someone is
waiting contended on dqio_mutex. So isn't this just working around a
contention on that mutex (and in a not very nice way)? For case where dquot
is already active (DQ_ACTIVE_B set), we can certainly make the locking much
more lightweight - essentially just a per-dquot lock for the time until we
copy the data from dquot to buffer_head...
  Also we could make writing dquot more lightweight (have ext4 specific
part of dquot and cache there buffer head which carries the dquot). Even
more, we could track look whether the dquot is already part of the
transaction and save get_write_access etc for that case (but that already
starts to be ugly)...

> +	else
> +		/*
> +		 * Dquot was already dirty. This means that other task already
> +		 * started a transaction but not clear dirty bit yet (see
> +		 * dquot_commit). Since the only one running transaction is
> +		 * possible at a time. Then that task belongs to the same
> +		 * transaction. We don'n have to actually write dquot changes
> +		 * because that task will write it for for us.
> +		 */
> +		ret = 0;
> +	err = ext4_journal_stop(handle);
> +	if (!ret)
> +		ret = err;
> +	return ret;
>  }
>  
>  static int ext4_write_info(struct super_block *sb, int type)
> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> index 6979224..e03eea0 100644
> --- a/fs/quota/dquot.c
> +++ b/fs/quota/dquot.c
> @@ -311,14 +311,24 @@ static inline int mark_dquot_dirty(struct dquot *dquot)
>  {
>  	return dquot->dq_sb->dq_op->mark_dirty(dquot);
>  }
> -
> -int dquot_mark_dquot_dirty(struct dquot *dquot)
> +/* mark dquot dirty in atomic meaner, and return old dirty state */
> +int __dquot_mark_dquot_dirty(struct dquot *dquot)
>  {
> +	int ret = 1;
>  	spin_lock(&dq_list_lock);
> -	if (!test_and_set_bit(DQ_MOD_B, &dquot->dq_flags))
> +	if (!test_and_set_bit(DQ_MOD_B, &dquot->dq_flags)) {
>  		list_add(&dquot->dq_dirty, &sb_dqopt(dquot->dq_sb)->
>  				info[dquot->dq_type].dqi_dirty_list);
> +		ret = 0;
> +	}
>  	spin_unlock(&dq_list_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL(__dquot_mark_dquot_dirty);
> +
> +int dquot_mark_dquot_dirty(struct dquot *dquot)
> +{
> +	__dquot_mark_dquot_dirty(dquot);
>  	return 0;
>  }
  Please just do the change do dquot_mark_dquot_dirty(). There's no need
for the new function.

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux