Re: [PATCHv4 17/17] writeback: lessen sync_supers wakeup count

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 27, 2010 at 01:51:09PM +0300, Artem Bityutskiy wrote:
> Nick, thanks for serialization suggestion.
> 
> On Thu, 2010-05-27 at 17:22 +1000, Nick Piggin wrote:
> > Yeah, we definitely don't want to add global cacheline writes in the
> > common case. Also I don't know why you do the strange -1 value. I
> > couldn't seem to find where you defined bdi_arm_supers_timer();
> 
> It is in mm/backing-dev.c:376 in today's Linus' tree. The -1 is used to

Yep I should have grepped. /hangs head


> indicate that 'sync_supers()' is in progress and avoid arming timer in
> that case. But yes, this is not really needed.

OK please remove it.

 
> > But why doesn't this work?
> > 
> >   sb->s_dirty = 1;
> >   smp_mb(); /* corresponding MB is in test_and_clear_bit */
> 
> AFAIU, test_and_clear_bit assumes 2 barriers - before the test and after
> the clear. Then I do not really understand why this smp_mb is needed.

You almost always need barriers executed on all sides of the
synchronisation protocol. Actually we need another, I confused
myself with the test_and_clear at the end.

1. sb->s_dirty = 1; /* store */
2. if (!supers_timer_armed) /* load */
3.   supers_timer_armed = 1; /* store */

and

A. supers_timer_armed = 0; /* store */
B. if (sb->s_dirty) /* load */
C.   sb->s_dirty = 0 /* store */

If these two sequences are executed, it must result in
sb->s_dirty == 1 iff supers_timer_armed

* If 2 is executed before 1 is visible, then 2 may miss A before B sees 1.
* If B is executed before A is visible, then B may miss 1 before 2 sees A.

So we need smp_mb() between 1/2 and A/B (I missed the 2nd one).

Now we still have a problem. After sync task rechecks
supers_timer_armed, the supers timer might execute before we mark
ourself as sleeping, and so we have another lost wakeup. It needs
to be checked after set_current_state.

Let's try this again. I much prefer to name the variable something
that indicates whether there is more work to be done, or whether we
can sleep.

How about something like this?
--

Index: linux-2.6/mm/backing-dev.c
===================================================================
--- linux-2.6.orig/mm/backing-dev.c
+++ linux-2.6/mm/backing-dev.c
@@ -45,6 +45,7 @@ LIST_HEAD(bdi_pending_list);
 
 static struct task_struct *sync_supers_tsk;
 static struct timer_list sync_supers_timer;
+static unsigned long supers_dirty __read_mostly;
 
 static int bdi_sync_supers(void *);
 static void sync_supers_timer_fn(unsigned long);
@@ -251,7 +252,6 @@ static int __init default_bdi_init(void)
 
 	init_timer(&sync_supers_timer);
 	setup_timer(&sync_supers_timer, sync_supers_timer_fn, 0);
-	bdi_arm_supers_timer();
 
 	err = bdi_init(&default_backing_dev_info);
 	if (!err)
@@ -362,17 +362,28 @@ static int bdi_sync_supers(void *unused)
 
 	while (!kthread_should_stop()) {
 		set_current_state(TASK_INTERRUPTIBLE);
-		schedule();
+		if (!supers_dirty)
+			schedule();
+		else
+			__set_current_state(TASK_RUNNING);
 
+		supers_dirty = 0;
 		/*
-		 * Do this periodically, like kupdated() did before.
+		 * supers_dirty store must be visible to mark_sb_dirty (below)
+		 * before sync_supers runs (which loads sb->s_dirty).
 		 */
+		smp_mb();
 		sync_supers();
 	}
 
 	return 0;
 }
 
+static void sync_supers_timer_fn(unsigned long unused)
+{
+	wake_up_process(sync_supers_tsk);
+}
+
 void bdi_arm_supers_timer(void)
 {
 	unsigned long next;
@@ -384,9 +395,17 @@ void bdi_arm_supers_timer(void)
 	mod_timer(&sync_supers_timer, round_jiffies_up(next));
 }
 
-static void sync_supers_timer_fn(unsigned long unused)
+void mark_sb_dirty(struct super_block *sb)
 {
-	wake_up_process(sync_supers_tsk);
+	sb->s_dirty = 1;
+	/*
+	 * sb->s_dirty store must be visible to sync_supers (above) before we
+	 * load supers_dirty in case we need to re-arm the timer.
+	 */
+	smp_mb();
+	if (likely(supers_dirty))
+		return;
+	supers_dirty = 1;
 	bdi_arm_supers_timer();
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux