Re: [PATCH 0/2] md: change approach to parallel flush requests.

Xiao Ni <xni@xxxxxxxxxx> · Thu, 28 Mar 2019 13:20:40 +0800

On 03/28/2019 06:28 AM, NeilBrown wrote:
On Wed, Mar 27 2019, Xiao Ni wrote:

----- Original Message -----
From: "NeilBrown" <neilb@xxxxxxxx>
To: "Xiao Ni" <xni@xxxxxxxxxx>, "Ming Lei" <ming.lei@xxxxxxxxxx>
Cc: linux-raid@xxxxxxxxxxxxxxx
Sent: Wednesday, March 27, 2019 8:19:28 AM
Subject: [PATCH 0/2] md: change approach to parallel flush requests.

Hi Xiao,
  I was looking at the issue of parallel flush requests being slow as a
  customer reported it against a kernel of ours that doesn't have your
  patch.

  I reviewed you patch more carefully and noticed a problem.  It calls
  submit_bio() multiple times from within a make_request_fn function.
  Calling submit_bio once is perfectly safe.  Calling it twice requires
  care and usually involves one of the call submitting part of the
  original, and the other resubmitting the original with that part
  removed.
  Calling it more often is rarely safe.  The problem is that all these
  bios get queued on current->biolist and don't even start being
  processed until the top level make_request_fn completes.  This can
  result in deadlocks.
Hi Neil

Thanks for pointing out this.
A: The flush bio from file system is waiting at ret = q->make_request_fn(q, bio);
B: Function md_flush_request submits flush bios to raid member disks. These bios
will be added into bio_list_on_stack[0] and return.
A: make_request_fn returns and handle those bios on bio_list_on_stack[0] by
q->make_request_fn(q, bio); After handling those bios it returns.

There is no deadlock, right?
The deadlock would happen when md_flush_request() calls mempool_alloc()
or bio_alloc_mddev().
If the kmalloc(GFP_NOIO) fails because there is no free memory, and if
all the objects pre-allocated to the mempool are trapped on the
bio_list_on_stack queue, then this can block indefinitely.

Thanks for pointing this.

  Also, with your patch there can be lots of flush requests all being
  sent at much the same time, which all do essentially the same
  thing.  This is fairly wasteful.

  I came up with a different approach.  I restored the old because of
  flush requests being serialized, but now if a caller will not bother
  sending a flush if some other caller submitted a flush request while
  it waited.  This should result in many fewer flush requests, and must
  less delay.
It can avoid sending flush bios which arrive at the same time. The one which
get the lock can do the flush job. It's really a good idea. But if most flush
bios are sent to md device in sequential, the mddev->last_flush is usually before
start. In this case there is still lock contention.
I'm not convinced that the problem is lock contention.  The problem that
you originally described was excessive latency.
This was caused by all the requests being serialized.
You fixed it by allowing requests to run in parallel.

I'm suggest that we fix it by not running most of the requests.

The caller will have to wait for the flush to complete.  It rarely
matters if it waits in generic_make_request() (as my solution will often
do) , or if it waits after generic_make_request() has completed (as your
solution does).

Is it better to use this patch with my method together?
  Would you be able to run you same test on this version to confirm
  that it fixes the performance issue, and if so review the code to see
  if you are happy with it.
I'll do a test with this patch. But it's very a simple environment. Can you ask
customer do the test too?
My customer has been testing this code for about 6 weeks, but they don't
have a reliable reproducer so are not yet certain that it makes a
difference.  We do know that it doesn't make things a lot worse.
As you had a reproducer, even if it is rather simple, testing against
that would increase my confidence in that code.

I'll give the test result later.

Regards
Xiao

Thanks,
NeilBrown

Regards
Xiao