On Tue, 1 Feb 2011 15:45:16 -0500 Paul Clements <paul.clements@xxxxxxxxxxx> wrote: > On Wed, Jan 26, 2011 at 8:55 AM, Paul Clements > <paul.clements@xxxxxxxxxxx> wrote: > > > Attached is a modified patch, which does the extra necessary work > > (bitmap_endwrite, md_write_end) on the bio before failing it. > > > > Does this look correct? It seems to work. > > Well, not quite...it's more complicated. From my reading of the code, > it looks like behind writes and barrier retries just do not work > correctly together. The issue is this: > > - With behind writes, we signal the master_bio complete as soon as all > non-write-behind writes are complete. > > - With barrier retries, you don't know if you'll need to retry until > you've completed all legs of the write (the last leg to complete might > throw EOPNOTSUPP). > > So in the case where the master_bio has been completed, we still try > to do a retry for the leg that failed the barrier (but it's really too > late to retry). In any case, raid1d is touching master_bio (looking at > bi_size and bio_cloning it) during the retry, which causes a panic if > master_bio is already being reused by someone else. > > I can't think of a good way to do behind writes and barrier retries > together. Seems we've got to disable behind writes for barriers, or > we've got to disable barrier retries when doing behind writes... > > Any thoughts? I suspect you are right that barriers and behind writes are deeply incompatible. I suspect they could be made to work together in some vaguely sane way, but I suspect it would be a lot of work and not worth the effort. Disabling behind-writes for all barrier requests would be quite easy, but it might negate a lot of the value of behind writes We could simply ignore the barrier flag on writes to behind-write devices, but that would risk them being even more inconsistent than they currently can be, so I doubt that is a good direct - though it is a possibility. I think the best option is to reject barrier writes if there are any behind-write devices. That would be reasonably safe and reasonably consistent. So maybe something like this?? NeilBrown Index: linux-2.6.32.orig/drivers/md/bitmap.c =================================================================== --- linux-2.6.32.orig.orig/drivers/md/bitmap.c 2009-12-03 14:51:21.000000000 +1100 +++ linux-2.6.32.orig/drivers/md/bitmap.c 2011-02-02 11:31:51.156585883 +1100 @@ -1676,6 +1676,8 @@ int bitmap_create(mddev_t *mddev) pages, bmname(bitmap)); mddev->bitmap = bitmap; + if (bitmap->max_write_behind) + mddev->barriers_work = 0; mddev->thread->timeout = bitmap->daemon_sleep * HZ; -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html