On Wed, Sep 18, 2019 at 10:19 PM Xiao Ni <xni@xxxxxxxxxx> wrote: > > > > On 09/19/2019 03:15 AM, David Jeffery wrote: > > On Tue, Sep 17, 2019 at 11:21 PM Xiao Ni <xni@xxxxxxxxxx> wrote: > >> md_flush_request returns false when one flush bio has data and > >> pers->make_request function go > >> on handling it. For example the raid device is raid1. md_flush_request > >> returns false, raid1_make_request > >> go on handling the bio. If raid1_make_request fails, the bio is still > >> lost. Now it looks like only md_handle_request > >> checks the return value of pers->make_request and go on handling the bio > >> if pers->make_request fails. > > But the bio isn't lost. Using raid1 as an example, the calling sequence is > > md_handle_request -> raid1_make_request -> md_flush_request. > > raid1_make_request is already wrapped by md_handle_request. So this > > earlier call to md_handle_request will re-submit the bio if raid1_make_request > > returns false after md_flush_request returns false. Anything which calls an > > mddev->pers->make_request function (only md_handle_request after patch) > > must already handle a return of false or it would also have a bug allowing I/O > > to be lost. > > Thanks for the explanation. You are right. The bio can't be lost. > > Regards > Xiao > > > >> There should not be a deadlock if it calls md_handle_request directly. > >> Am I right? If there is a risk, we > >> can put those bios into a list and queue a work in workqueue to handle > >> them. Is it a better way? > > I don't see a deadlock with calling md_handle_request from md_flush_request. > > It's just more stack and overhead when we could instead let the first calls to > > these functions advance the I/O instead of recursing into new instances. > > > >> Regards > >> Xiao > > David Jeffery Sorry to jumping late. @Xiao, does David's patch look good to you? If so, please reply with your Reviewed-by and/or Tested-by. Thanks, Song