On 09/19/2019 03:15 AM, David Jeffery wrote:
On Tue, Sep 17, 2019 at 11:21 PM Xiao Ni <xni@xxxxxxxxxx> wrote:
md_flush_request returns false when one flush bio has data and
pers->make_request function go
on handling it. For example the raid device is raid1. md_flush_request
returns false, raid1_make_request
go on handling the bio. If raid1_make_request fails, the bio is still
lost. Now it looks like only md_handle_request
checks the return value of pers->make_request and go on handling the bio
if pers->make_request fails.
But the bio isn't lost. Using raid1 as an example, the calling sequence is
md_handle_request -> raid1_make_request -> md_flush_request.
raid1_make_request is already wrapped by md_handle_request. So this
earlier call to md_handle_request will re-submit the bio if raid1_make_request
returns false after md_flush_request returns false. Anything which calls an
mddev->pers->make_request function (only md_handle_request after patch)
must already handle a return of false or it would also have a bug allowing I/O
to be lost.
Thanks for the explanation. You are right. The bio can't be lost.
Regards
Xiao
There should not be a deadlock if it calls md_handle_request directly.
Am I right? If there is a risk, we
can put those bios into a list and queue a work in workqueue to handle
them. Is it a better way?
I don't see a deadlock with calling md_handle_request from md_flush_request.
It's just more stack and overhead when we could instead let the first calls to
these functions advance the I/O instead of recursing into new instances.
Regards
Xiao
David Jeffery