On Tue, Sep 17, 2019 at 11:21 PM Xiao Ni <xni@xxxxxxxxxx> wrote: > > md_flush_request returns false when one flush bio has data and > pers->make_request function go > on handling it. For example the raid device is raid1. md_flush_request > returns false, raid1_make_request > go on handling the bio. If raid1_make_request fails, the bio is still > lost. Now it looks like only md_handle_request > checks the return value of pers->make_request and go on handling the bio > if pers->make_request fails. But the bio isn't lost. Using raid1 as an example, the calling sequence is md_handle_request -> raid1_make_request -> md_flush_request. raid1_make_request is already wrapped by md_handle_request. So this earlier call to md_handle_request will re-submit the bio if raid1_make_request returns false after md_flush_request returns false. Anything which calls an mddev->pers->make_request function (only md_handle_request after patch) must already handle a return of false or it would also have a bug allowing I/O to be lost. > > There should not be a deadlock if it calls md_handle_request directly. > Am I right? If there is a risk, we > can put those bios into a list and queue a work in workqueue to handle > them. Is it a better way? I don't see a deadlock with calling md_handle_request from md_flush_request. It's just more stack and overhead when we could instead let the first calls to these functions advance the I/O instead of recursing into new instances. > > Regards > Xiao David Jeffery