Re: [md PATCH 4/4] md/bitmap: Don't write bitmap while earlier writes might be in-flight

NeilBrown <neilb@xxxxxxxx> · Tue, 08 Nov 2016 07:19:05 +1100

On Tue, Nov 08 2016, Shaohua Li wrote:

> On Mon, Nov 07, 2016 at 09:53:42AM +1100, Neil Brown wrote:
>> On Sat, Nov 05 2016, Shaohua Li wrote:
>> 
>> > On Fri, Nov 04, 2016 at 04:46:03PM +1100, Neil Brown wrote:
>> >> As we don't wait for writes to complete in bitmap_daemon_work, they
>> >> could still be in-flight when bitmap_unplug writes again.  Or when
>> >> bitmap_daemon_work tries to write again.
>> >> This can be confusing and could risk the wrong data being written last.
>> >
>> > Applied the first 3 patches, thanks!
>> >
>> > This one seems not completely solving the race condition. It's still possible
>> > bitmap_daemon_work clears BITMAP_PAGE_NEEDWRITE but hasn't dispatch the IO yet,
>> > bitmap_unplug then does nothing and thinks bitmap is updated to disk. Why don't
>> > we add locking here?
>> 
>> Thanks for the review!
>> 
>> BITMAP_PAGE_NEEDWRITE is set for pages that need to be written out in
>> order to clear bits that don't need to be set any more.  There is never
>> any urgency to do this.
>> BITMAP_PAGE_DIRTY is set of pages that need to be written out in order
>> to set bits representing regions that are about to be written to.  These
>> have to be flushed by bitmap_unplug().
>> Pages can have both bits set, in which case bitmap_daemon_work() will
>> leave them for bitmap_unplug() to deal with.
>> 
>> So if bitmap_daemon_work() clears BITMAP_PAGE_PENDING on a page, then it
>> is a page that bitmap_unplug() doesn't need to wait for.
>
> Oops, I misread the code. Yes, this is very clear, thanks for the explaination.
>
> So I can understand the patch fixes the confusion. when is there risk wrong
> data is written?

The goal is to avoid the possibility that two writes to the same
location are in flight at the same time.
It isn't clear that this would cause a problem, but it isn't clear that
it is completely safe either.
In general, this doesn't happen - certainly not from the page cache - so
drivers will not be prepared for it.
As an extreme example, suppose that the target device is a multi-path
device.
One write is sent and just after it is DMAed to one path, the in-memory
page is changed and a second write is submitted, and the data is DMAed
down the other path.  Are we certain the two will be committed to
storage in the intended order?
Probably they will be.  But maybe with a more complex stack, the chance
of one IO overtaking the other (even after the first DMA) increases.

So I cannot argue very strongly for this code, but it seems like a good
idea and is reasonably simple....

Thanks,
NeilBrown
Attachment:
signature.asc

Description: PGP signature