On Wed, Oct 10, 2012 at 06:07:31PM +0800, zwu.kernel@xxxxxxxxx wrote: > From: Zhi Yong Wu <wuzhy@xxxxxxxxxxxxxxxxxx> > > Add a per-superblock workqueue and a work_struct > to run periodic work to update map info on each superblock. > > Signed-off-by: Zhi Yong Wu <wuzhy@xxxxxxxxxxxxxxxxxx> > --- > fs/hot_tracking.c | 94 ++++++++++++++++++++++++++++++++++++++++++ > fs/hot_tracking.h | 3 + > include/linux/hot_tracking.h | 2 + > 3 files changed, 99 insertions(+), 0 deletions(-) > > diff --git a/fs/hot_tracking.c b/fs/hot_tracking.c > index a8dc599..f333c47 100644 > --- a/fs/hot_tracking.c > +++ b/fs/hot_tracking.c > @@ -15,6 +15,8 @@ > #include <linux/module.h> > #include <linux/spinlock.h> > #include <linux/hardirq.h> > +#include <linux/kthread.h> > +#include <linux/freezer.h> > #include <linux/fs.h> > #include <linux/blkdev.h> > #include <linux/types.h> > @@ -623,6 +625,88 @@ static void hot_map_array_exit(struct hot_info *root) > } > > /* > + * Update temperatures for each hot inode item and > + * hot range item for aging purposes > + */ > +static void hot_temperature_update_work(struct work_struct *work) > +{ > + struct hot_update_work *hot_work = > + container_of(work, struct hot_update_work, work); > + struct hot_info *root = hot_work->hot_info; > + struct hot_inode_item *hi_nodes[8]; > + unsigned long delay = HZ * HEAT_UPDATE_DELAY; > + u64 ino = 0; > + int i, n; > + > + do { > + while (1) { > + spin_lock(&root->lock); > + n = radix_tree_gang_lookup(&root->hot_inode_tree, > + (void **)hi_nodes, ino, > + ARRAY_SIZE(hi_nodes)); > + if (!n) { > + spin_unlock(&root->lock); > + break; > + } > + > + ino = hi_nodes[n - 1]->i_ino + 1; > + for (i = 0; i < n; i++) { > + kref_get(&hi_nodes[i]->hot_inode.refs); > + hot_map_array_update( > + &hi_nodes[i]->hot_inode.hot_freq_data, root); > + hot_range_update(hi_nodes[i], root); > + hot_inode_item_put(hi_nodes[i]); > + } > + spin_unlock(&root->lock); This is a lot of work to do under a spin lock. Perhaps you should get a reference on all the nodes, then drop the root->lock and then update all the nodes in a separate loop. > + } > + > + if (unlikely(freezing(current))) { > + __refrigerator(true); > + } else { > + set_current_state(TASK_INTERRUPTIBLE); > + if (!kthread_should_stop()) { > + schedule_timeout(delay); > + } > + __set_current_state(TASK_RUNNING); > + } > + } while (!kthread_should_stop()); I don't think you understand workqueues fully. A work queue worker function is not something that executes endlessly. It is a "one-shot" function that does the work once, not an endless loop that has to delay it's execution for periodic work. If you need periodic work, then you should use a struct delayed_work and queue the next work iteration to be run a later time. See, for example, xfs_syncd_worker() and xfs_syncd_queue_sync() and how that reschedules itself for periodic work. It also means you don't have to handle kthread freezing, as the WQ infrastructure takes care of that for you. This is why unmount is hanging for me - this work never completes, so flush_workqueue() will never return. > +} > + > +static int hot_wq_init(struct hot_info *root) > +{ > + struct hot_update_work *hot_work; > + int ret = 0; > + > + root->update_wq = alloc_workqueue( > + "hot_temperature_update", WQ_MEM_RECLAIM | WQ_UNBOUND, 1); > + if (!root->update_wq) { > + printk(KERN_ERR "%s: failed to create " > + "temperature update workqueue\n", > + __func__); > + return 1; > + } > + > + hot_work = kmalloc(sizeof(*hot_work), GFP_NOFS); > + if (hot_work) { > + hot_work->hot_info = root; > + INIT_WORK(&hot_work->work, hot_temperature_update_work); > + queue_work(root->update_wq, &hot_work->work); > + } else { > + printk(KERN_ERR "%s: failed to create update work\n", > + __func__); > + ret = 1; > + } I don't understand why you need a separate "hot_work" structure. just embed a struct delayed_work in the struct hot_info and use container_of() to get the struct hot_info from the work structure. As such, there's no need for a separate function just for this initialisation - just put it in line. > + > + return ret; > +} > + > +static void hot_wq_exit(struct workqueue_struct *wq) > +{ > + flush_workqueue(wq); flush_workqueue_sync(). > + destroy_workqueue(wq); > +} And there's not need for separate function for this - put it in line. FWIW, it also leaks the hot_work structure, but you're going to remove that anyway. ;) > diff --git a/fs/hot_tracking.h b/fs/hot_tracking.h > index d19e64a..7a79a6d 100644 > --- a/fs/hot_tracking.h > +++ b/fs/hot_tracking.h > @@ -36,6 +36,9 @@ > */ > #define TIME_TO_KICK 400 > > +/* set how often to update temperatures (seconds) */ > +#define HEAT_UPDATE_DELAY 400 FWIW, 400 seconds is an unusual time period. It's expected that periodic work might take place at intervals of 5 minutes, 10 minutes, etc, not 6m40s. It's much easier to predict and understand behaviour if it's at a interval of whole units like minutes, especially when looking at timestamped event traces. Hence 300s (5 minutes) makes a lot more sense as a period for updates... > /* > * The following comments explain what exactly comprises a unit of heat. > * > diff --git a/include/linux/hot_tracking.h b/include/linux/hot_tracking.h > index 7114179..b37e0f8 100644 > --- a/include/linux/hot_tracking.h > +++ b/include/linux/hot_tracking.h > @@ -84,6 +84,8 @@ struct hot_info { > > /* map of range temperature */ > struct hot_map_head heat_range_map[HEAT_MAP_SIZE]; > + > + struct workqueue_struct *update_wq; Add the struct delayed_work here, too. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html