On Tue, Nov 08, 2016 at 07:19:05AM +1100, Neil Brown wrote: > On Tue, Nov 08 2016, Shaohua Li wrote: > > > On Mon, Nov 07, 2016 at 09:53:42AM +1100, Neil Brown wrote: > >> On Sat, Nov 05 2016, Shaohua Li wrote: > >> > >> > On Fri, Nov 04, 2016 at 04:46:03PM +1100, Neil Brown wrote: > >> >> As we don't wait for writes to complete in bitmap_daemon_work, they > >> >> could still be in-flight when bitmap_unplug writes again. Or when > >> >> bitmap_daemon_work tries to write again. > >> >> This can be confusing and could risk the wrong data being written last. > >> > > >> > Applied the first 3 patches, thanks! > >> > > >> > This one seems not completely solving the race condition. It's still possible > >> > bitmap_daemon_work clears BITMAP_PAGE_NEEDWRITE but hasn't dispatch the IO yet, > >> > bitmap_unplug then does nothing and thinks bitmap is updated to disk. Why don't > >> > we add locking here? > >> > >> Thanks for the review! > >> > >> BITMAP_PAGE_NEEDWRITE is set for pages that need to be written out in > >> order to clear bits that don't need to be set any more. There is never > >> any urgency to do this. > >> BITMAP_PAGE_DIRTY is set of pages that need to be written out in order > >> to set bits representing regions that are about to be written to. These > >> have to be flushed by bitmap_unplug(). > >> Pages can have both bits set, in which case bitmap_daemon_work() will > >> leave them for bitmap_unplug() to deal with. > >> > >> So if bitmap_daemon_work() clears BITMAP_PAGE_PENDING on a page, then it > >> is a page that bitmap_unplug() doesn't need to wait for. > > > > Oops, I misread the code. Yes, this is very clear, thanks for the explaination. > > > > So I can understand the patch fixes the confusion. when is there risk wrong > > data is written? > > The goal is to avoid the possibility that two writes to the same > location are in flight at the same time. > It isn't clear that this would cause a problem, but it isn't clear that > it is completely safe either. > In general, this doesn't happen - certainly not from the page cache - so > drivers will not be prepared for it. > As an extreme example, suppose that the target device is a multi-path > device. > One write is sent and just after it is DMAed to one path, the in-memory > page is changed and a second write is submitted, and the data is DMAed > down the other path. Are we certain the two will be committed to > storage in the intended order? > Probably they will be. But maybe with a more complex stack, the chance > of one IO overtaking the other (even after the first DMA) increases. > > So I cannot argue very strongly for this code, but it seems like a good > idea and is reasonably simple.... I didn't try to argue about the patch, it makes things clear actually. Just want to know where the data corruption comes from. I think I have an answer now. I'll add this patch. Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html