raid1 data corruption during resync

Eivind Sarto <eivindsarto@xxxxxxxxx> · Fri, 29 Aug 2014 12:29:52 -0700




I am seeing occasional data corruption during raid1 resync.
Reviewing the raid1 code, I suspect that commit 79ef3a8aa1cb1523cc231c9a90a278333c21f761 introduced a bug.
Prior to this commit raise_barrier() used to wait for conf->nr_pending to become zero.  It no longer does this.
It is not easy to reproduce the corruption, so I wanted to ask about the following potential fix while I am still testing it.
Once I validate that the fix indeed works, I will post a proper patch.
Do you have any feedback?

— drivers/md/raid1.c	2014-08-22 15:19:15.000000000 -0700
+++ /tmp/raid1.c	2014-08-29 12:07:51.000000000 -0700
@@ -851,7 +851,7 @@ static void raise_barrier(struct r1conf 
 	 *    handling.
 	 */
 	wait_event_lock_irq(conf->wait_barrier,
-			    !conf->array_frozen &&
+			    !conf->array_frozen && !conf->nr_pending &&
 			    conf->barrier < RESYNC_DEPTH &&
 			    (conf->start_next_window >=
 			     conf->next_resync + RESYNC_SECTORS),


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html