Re: [PATCH V3 1/1] MD: fix lock contention for flush bios

Xiao Ni <xni@xxxxxxxxxx> · Mon, 2 Apr 2018 22:48:36 -0400 (EDT)

----- Original Message -----
> From: "Shaohua Li" <shli@xxxxxxxxxx>
> To: "Xiao Ni" <xni@xxxxxxxxxx>
> Cc: linux-raid@xxxxxxxxxxxxxxx, "ming lei" <ming.lei@xxxxxxxxxx>, ncroxon@xxxxxxxxxx, neilb@xxxxxxxx
> Sent: Monday, April 2, 2018 7:02:37 AM
> Subject: Re: [PATCH V3 1/1] MD: fix lock contention for flush bios
> 
> On Wed, Mar 21, 2018 at 02:47:22PM +0800, Xiao Ni wrote:
> > There is a lock contention when there are many processes which send flush
> > bios
> > to md device. eg. Create many lvs on one raid device and mkfs.xfs on each
> > lv.
> > 
> > Now it just can handle flush request sequentially. It needs to wait
> > mddev->flush_bio
> > to be NULL, otherwise get mddev->lock.
> > 
> > This patch remove mddev->flush_bio and handle flush bio asynchronously.
> > I did a test with command dbench -s 128 -t 4800. This is the test result:
> > 
> > =================Without the patch============================
> >  Operation                Count    AvgLat    MaxLat
> >  --------------------------------------------------
> >  Flush                     5239   142.590  3972.034
> >  Close                    53114     0.176   498.236
> >  LockX                      208     0.066     0.907
> >  Rename                    2793     0.335     7.203
> >  ReadX                    98100     0.020     2.280
> >  WriteX                   67800   555.649  8238.498
> >  Unlink                    7985     1.742   446.503
> >  UnlockX                    208     0.058     1.013
> >  FIND_FIRST               21035     0.141     3.147
> >  SET_FILE_INFORMATION      6419     0.090     1.539
> >  QUERY_FILE_INFORMATION   18244     0.007     0.130
> >  QUERY_PATH_INFORMATION   55622     0.060     3.884
> >  QUERY_FS_INFORMATION      9451     0.040     1.148
> >  NTCreateX                63960     0.717   536.542
> > 
> > Throughput 12.1782 MB/sec (sync open)  128 clients  128 procs
> > max_latency=8238.513 ms
> > 
> > =====================With the patch===========================
> >  Operation                Count    AvgLat    MaxLat
> >  --------------------------------------------------
> >  Flush                    34858    36.484   668.243
> >  Close                   379883     0.107   252.232
> >  LockX                     1792     0.048     1.070
> >  Rename                   21761     0.804   266.659
> >  ReadX                   817947     0.021    42.891
> >  WriteX                  254804   142.485   948.090
> >  Unlink                   99665     3.590   899.816
> >  UnlockX                   1792     0.056     1.240
> >  FIND_FIRST              178857     0.187    23.287
> >  SET_FILE_INFORMATION     41612     0.135    26.575
> >  QUERY_FILE_INFORMATION   83691     0.007     2.589
> >  QUERY_PATH_INFORMATION  470889     0.077    83.846
> >  QUERY_FS_INFORMATION     82764     0.056    10.368
> >  NTCreateX               512262     0.616   809.980
> > 
> > Throughput 53.6545 MB/sec (sync open)  128 clients  128 procs
> > max_latency=948.105 ms
> > 
> > V3:
> > Shaohua suggests mempool is overkill. In v3 it allocs memory during
> > creating raid device
> > and uses a simple bitmap to record which resource is free.
> 
> Sorry for the delay. The bitmap method is still too complicated. Can we do
> something like this:
> 
> in mddev:
> struct flush_info flush_infos[8 or 16]; Maybe don't need a special struct
> too.
> 
> the info can include a lock, every time we handle a flush, we select a
> flush_info by flush_infos[jhash(flush_request_bio_address)];
> 
> we then take the lock and do whatever current code does against the specific
> flush_info. Isn't this a simpler implementation? I think 8 or 16 locks should
> reduce most of the lock contention.

Hi Shaohua

There is a problem. If the method hash give the same result. The flush bios will
wait for one flush_info, even there are some flush_info are free. If we use the
bitmap method, the flush request can get one flush info in the flush_infos[]. 

Best Regards
Xiao

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html