Re: [PATCH 1/1] Call md_handle_request directly in md_flush_request

David Jeffery <djeffery@xxxxxxxxxx> · Wed, 18 Sep 2019 15:15:17 -0400

On Tue, Sep 17, 2019 at 11:21 PM Xiao Ni <xni@xxxxxxxxxx> wrote:
>
> md_flush_request returns false when one flush bio has data and
> pers->make_request function go
> on handling it. For example the raid device is raid1. md_flush_request
> returns false, raid1_make_request
> go on handling the bio. If raid1_make_request fails, the bio is still
> lost. Now it looks like only md_handle_request
> checks the return value of pers->make_request and go on handling the bio
> if pers->make_request fails.

But the bio isn't lost.  Using raid1 as an example, the calling sequence is
md_handle_request -> raid1_make_request -> md_flush_request.
raid1_make_request is already wrapped by md_handle_request.  So this
earlier call to md_handle_request will re-submit the bio if raid1_make_request
returns false after md_flush_request returns false.  Anything which calls an
mddev->pers->make_request function (only md_handle_request after patch)
must already handle a return of false or it would also have a bug allowing I/O
to be lost.

>
> There should not be a deadlock if it calls md_handle_request directly.
> Am I right? If there is a risk, we
> can put those bios into a list and queue a work in workqueue to handle
> them. Is it a better way?

I don't see a deadlock with calling md_handle_request from md_flush_request.
It's just more stack and overhead when we could instead let the first calls to
these functions advance the I/O instead of recursing into new instances.

>
> Regards
> Xiao

David Jeffery