Re: [PATCH 0/2] md: change approach to parallel flush requests.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 27 2019, Xiao Ni wrote:

> ----- Original Message -----
>> From: "NeilBrown" <neilb@xxxxxxxx>
>> To: "Xiao Ni" <xni@xxxxxxxxxx>, "Ming Lei" <ming.lei@xxxxxxxxxx>
>> Cc: linux-raid@xxxxxxxxxxxxxxx
>> Sent: Wednesday, March 27, 2019 8:19:28 AM
>> Subject: [PATCH 0/2] md: change approach to parallel flush requests.
>> 
>> Hi Xiao,
>>  I was looking at the issue of parallel flush requests being slow as a
>>  customer reported it against a kernel of ours that doesn't have your
>>  patch.
>> 
>>  I reviewed you patch more carefully and noticed a problem.  It calls
>>  submit_bio() multiple times from within a make_request_fn function.
>>  Calling submit_bio once is perfectly safe.  Calling it twice requires
>>  care and usually involves one of the call submitting part of the
>>  original, and the other resubmitting the original with that part
>>  removed.
>>  Calling it more often is rarely safe.  The problem is that all these
>>  bios get queued on current->biolist and don't even start being
>>  processed until the top level make_request_fn completes.  This can
>>  result in deadlocks.
>
> Hi Neil
>
> Thanks for pointing out this. 
> A: The flush bio from file system is waiting at ret = q->make_request_fn(q, bio);  
> B: Function md_flush_request submits flush bios to raid member disks. These bios 
> will be added into bio_list_on_stack[0] and return. 
> A: make_request_fn returns and handle those bios on bio_list_on_stack[0] by 
> q->make_request_fn(q, bio); After handling those bios it returns. 
>
> There is no deadlock, right?

The deadlock would happen when md_flush_request() calls mempool_alloc()
or bio_alloc_mddev().
If the kmalloc(GFP_NOIO) fails because there is no free memory, and if
all the objects pre-allocated to the mempool are trapped on the
bio_list_on_stack queue, then this can block indefinitely.

>> 
>>  Also, with your patch there can be lots of flush requests all being
>>  sent at much the same time, which all do essentially the same
>>  thing.  This is fairly wasteful.
>> 
>>  I came up with a different approach.  I restored the old because of
>>  flush requests being serialized, but now if a caller will not bother
>>  sending a flush if some other caller submitted a flush request while
>>  it waited.  This should result in many fewer flush requests, and must
>>  less delay.
>
> It can avoid sending flush bios which arrive at the same time. The one which
> get the lock can do the flush job. It's really a good idea. But if most flush
> bios are sent to md device in sequential, the mddev->last_flush is usually before
> start. In this case there is still lock contention.

I'm not convinced that the problem is lock contention.  The problem that
you originally described was excessive latency.
This was caused by all the requests being serialized.
You fixed it by allowing requests to run in parallel.

I'm suggest that we fix it by not running most of the requests.

The caller will have to wait for the flush to complete.  It rarely
matters if it waits in generic_make_request() (as my solution will often
do) , or if it waits after generic_make_request() has completed (as your
solution does).

> Is it better to use this patch with my method together? 
>> 
>>  Would you be able to run you same test on this version to confirm
>>  that it fixes the performance issue, and if so review the code to see
>>  if you are happy with it.
>
> I'll do a test with this patch. But it's very a simple environment. Can you ask
> customer do the test too? 

My customer has been testing this code for about 6 weeks, but they don't
have a reliable reproducer so are not yet certain that it makes a
difference.  We do know that it doesn't make things a lot worse.
As you had a reproducer, even if it is rather simple, testing against
that would increase my confidence in that code.

Thanks,
NeilBrown


>
> Regards
> Xiao

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux