On Mon, 2003-10-20 at 03:43, Dick Streefland wrote: > Neil Brown <neilb@cse.unsw.edu.au> wrote: > | Thanks for providing a script... > | It works fine for me (2.6.0-test8). > | > | I don't suppose there is anything in the kernel logs about write > | errors on loop2 ??? > > No, there was nothing unusual in the log files. I have no access to > the test machine at the moment, but there is a message when the > recovery starts, and a few seconds later the message "sync done". > > | Does it fail consistently for you, or only occasionally? > > It fails every time. This test was on an dual PIII 450 system, but it > also fails on a VIA C6 system with the 2.6.0-test5 kernel. Both > kernels are compiled without CONFIG_PREEMPT, because I had other > problems that might be related to this option: > > http://www.spinics.net/lists/raid/msg03507.html > > Could this be related to CONFIG_DM_IOCTL_V4? I was not sure about this > option, and have not enabled it. Otherwise, I think it is time to put > in some printk's. Do you have suggestions where to start looking? I have been experiencing the same problem on my test machine. I found out that the resync terminated early because of MD_RECOVERY_ER R bit set by raid1's sync_write_request(). I don't understand why it fails the sync when all the writes already completed successfully and quickly. If there is a need to check for "nowhere to write this to" as in 2.4.x kernel, I think we need a different check. The following patch for 2.6.0-test8 kernel seems to fix it. --- a/raid1.c 2003-10-17 16:43:14.000000000 -0500 +++ b/raid1.c 2003-10-22 11:57:59.350900256 -0500 @@ -841,7 +841,7 @@ } if (atomic_dec_and_test(&r1_bio->remaining)) { - md_done_sync(mddev, r1_bio->master_bio->bi_size >> 9, 0); + md_done_sync(mddev, r1_bio->master_bio->bi_size >> 9, 1); put_buf(r1_bio); } } - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html