Re: [RFC v3 09/13] vfs: add one wq to update map info periodically

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 16 Oct 2012 11:27:44 +1100

On Wed, Oct 10, 2012 at 06:07:31PM +0800, zwu.kernel@xxxxxxxxx wrote:
> From: Zhi Yong Wu <wuzhy@xxxxxxxxxxxxxxxxxx>
> 
>   Add a per-superblock workqueue and a work_struct
>  to run periodic work to update map info on each superblock.
> 
> Signed-off-by: Zhi Yong Wu <wuzhy@xxxxxxxxxxxxxxxxxx>
> ---
>  fs/hot_tracking.c            |   94 ++++++++++++++++++++++++++++++++++++++++++
>  fs/hot_tracking.h            |    3 +
>  include/linux/hot_tracking.h |    2 +
>  3 files changed, 99 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c
> index a8dc599..f333c47 100644
> --- a/fs/hot_tracking.c
> +++ b/fs/hot_tracking.c
> @@ -15,6 +15,8 @@
>  #include <linux/module.h>
>  #include <linux/spinlock.h>
>  #include <linux/hardirq.h>
> +#include <linux/kthread.h>
> +#include <linux/freezer.h>
>  #include <linux/fs.h>
>  #include <linux/blkdev.h>
>  #include <linux/types.h>
> @@ -623,6 +625,88 @@ static void hot_map_array_exit(struct hot_info *root)
>  }
>  
>  /*
> + * Update temperatures for each hot inode item and
> + * hot range item for aging purposes
> + */
> +static void hot_temperature_update_work(struct work_struct *work)
> +{
> +	struct hot_update_work *hot_work =
> +			container_of(work, struct hot_update_work, work);
> +	struct hot_info *root = hot_work->hot_info;
> +	struct hot_inode_item *hi_nodes[8];
> +	unsigned long delay = HZ * HEAT_UPDATE_DELAY;
> +	u64 ino = 0;
> +	int i, n;
> +
> +	do {
> +		while (1) {
> +			spin_lock(&root->lock);
> +			n = radix_tree_gang_lookup(&root->hot_inode_tree,
> +					   (void **)hi_nodes, ino,
> +					   ARRAY_SIZE(hi_nodes));
> +			if (!n) {
> +				spin_unlock(&root->lock);
> +				break;
> +			}
> +
> +			ino = hi_nodes[n - 1]->i_ino + 1;
> +			for (i = 0; i < n; i++) {
> +				kref_get(&hi_nodes[i]->hot_inode.refs);
> +				hot_map_array_update(
> +					&hi_nodes[i]->hot_inode.hot_freq_data, root);
> +				hot_range_update(hi_nodes[i], root);
> +				hot_inode_item_put(hi_nodes[i]);
> +			}
> +			spin_unlock(&root->lock);

This is a lot of work to do under a spin lock. Perhaps you should
get a reference on all the nodes, then drop the root->lock and then
update all the nodes in a separate loop.

> +		}
> +
> +		if (unlikely(freezing(current))) {
> +			__refrigerator(true);
> +		} else {
> +			set_current_state(TASK_INTERRUPTIBLE);
> +			if (!kthread_should_stop()) {
> +				schedule_timeout(delay);
> +			}
> +			__set_current_state(TASK_RUNNING);
> +		}
> +	} while (!kthread_should_stop());

I don't think you understand workqueues fully. A work queue worker
function is not something that executes endlessly. It is a
"one-shot" function that does the work once, not an endless loop
that has to delay it's execution for periodic work.

If you need periodic work, then you should use a struct delayed_work
and queue the next work iteration to be run a later time. See, for
example, xfs_syncd_worker() and xfs_syncd_queue_sync() and how that
reschedules itself for periodic work. It also means you don't have
to handle kthread freezing, as the WQ infrastructure takes care of
that for you.

This is why unmount is hanging for me - this work never completes,
so flush_workqueue() will never return.

> +}
> +
> +static int hot_wq_init(struct hot_info *root)
> +{
> +	struct hot_update_work *hot_work;
> +	int ret = 0;
> +
> +	root->update_wq = alloc_workqueue(
> +		"hot_temperature_update", WQ_MEM_RECLAIM | WQ_UNBOUND, 1);
> +	if (!root->update_wq) {
> +		printk(KERN_ERR "%s: failed to create "
> +			"temperature update workqueue\n",
> +			__func__);
> +		return 1;
> +	}
> +
> +	hot_work = kmalloc(sizeof(*hot_work), GFP_NOFS);
> +	if (hot_work) {
> +		hot_work->hot_info = root;
> +		INIT_WORK(&hot_work->work, hot_temperature_update_work);
> +		queue_work(root->update_wq, &hot_work->work);
> +	} else {
> +		printk(KERN_ERR "%s: failed to create update work\n",
> +				__func__);
> +		ret = 1;
> +	}

I don't understand why you need a separate "hot_work" structure.
just embed a struct delayed_work in the struct hot_info and use
container_of() to get the struct hot_info from the work structure.
As such, there's no need for a separate function just for this
initialisation - just put it in line.

> +
> +	return ret;
> +}
> +
> +static void hot_wq_exit(struct workqueue_struct *wq)
> +{
> +	flush_workqueue(wq);

flush_workqueue_sync().

> +	destroy_workqueue(wq);
> +}

And there's not need for separate function for this - put it in
line.

FWIW, it also leaks the hot_work structure, but you're going to
remove that anyway. ;)

> diff --git a/fs/hot_tracking.h b/fs/hot_tracking.h
> index d19e64a..7a79a6d 100644
> --- a/fs/hot_tracking.h
> +++ b/fs/hot_tracking.h
> @@ -36,6 +36,9 @@
>   */
>  #define TIME_TO_KICK 400
>  
> +/* set how often to update temperatures (seconds) */
> +#define HEAT_UPDATE_DELAY 400

FWIW, 400 seconds is an unusual time period. It's expected that
periodic work might take place at intervals of 5 minutes, 10
minutes, etc, not 6m40s. It's much easier to predict and understand
behaviour if it's at a interval of whole units like minutes,
especially when looking at timestamped event traces. Hence 300s (5
minutes) makes a lot more sense as a period for updates...

>  /*
>   * The following comments explain what exactly comprises a unit of heat.
>   *
> diff --git a/include/linux/hot_tracking.h b/include/linux/hot_tracking.h
> index 7114179..b37e0f8 100644
> --- a/include/linux/hot_tracking.h
> +++ b/include/linux/hot_tracking.h
> @@ -84,6 +84,8 @@ struct hot_info {
>  
>  	/* map of range temperature */
>  	struct hot_map_head heat_range_map[HEAT_MAP_SIZE];
> +
> +	struct workqueue_struct *update_wq;

Add the struct delayed_work here, too.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html