On Mon, 25 Feb 2002, Neil Brown wrote: > As he says, the patch is rather ugly and doesn't really address the > root problem. But if it works for you, that is good. > I think it's ugly because it puts some structures into raid5.c code, which should be accessible from structures already defined in the code (function). I do think, that patch is right as far as root of the problem is concerned. What I don't understand is, why is ->faulty flag used all thru md.c when we have mark_disk_faulty(sb->disks+disk->number); and bitmaped status for the same reason. Are they diferent in any case, or is it the case, that structure mdp_disk_t used in disk_faulty is not accessible on those places. It seems that on SMP machines md_wakeup_thread gets executed on other CPU without mark ->faulty being set. If there would be a way to set ->faulty in raid5_error without calling rrdev = find_rdev(mddev, dev); and friends this would be quite right fix. I also suspect, that same race exists for mirror code (probably others too), since I don't se any lock and logic seems to me exactly the same. > I think that the "right" approach is to claim reconfig_sem (which is > currently unused I think) while writing out the superblocks, and when > releasing the per-device superblock, and probably when doing a few > other things. I wouldn't know about those. But if I look closer in raid5.c we kill ourselves on SMP machines with calling md_wakeup_thread in any case. Would call to wake_up(&thread->wqueue); honor this mutex and wait for md_error to finish ? > I will have a closer look over the code and see how well this can > work. > Please do. I'm holding release of few servers into production until this race is properly fixed and looking forward to the proper fix. So, we have testing computers on disposal for test for at least this week. lp gody __________________________________________________________________ | Matjaz Godec | Agenda d.o.o. | ISP for business | | Tech. Manager | Gosposvetska 84 | WAN networks | | gody@slon.net | si-2000 Maribor | Internet/Intranet | | tel:+386.2.2340860 | Slovenija | Application servers | |http://www.slon.net |http://www.agenda.si | Caldera OpenLinux | - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html