Re: [PATCH V3 1/1] MD: fix lock contention for flush bios

Shaohua Li <shli@xxxxxxxxxx> · Tue, 3 Apr 2018 09:26:52 -0700

On Mon, Apr 02, 2018 at 10:48:36PM -0400, Xiao Ni wrote:
> 
> 
> ----- Original Message -----
> > From: "Shaohua Li" <shli@xxxxxxxxxx>
> > To: "Xiao Ni" <xni@xxxxxxxxxx>
> > Cc: linux-raid@xxxxxxxxxxxxxxx, "ming lei" <ming.lei@xxxxxxxxxx>, ncroxon@xxxxxxxxxx, neilb@xxxxxxxx
> > Sent: Monday, April 2, 2018 7:02:37 AM
> > Subject: Re: [PATCH V3 1/1] MD: fix lock contention for flush bios
> > 
> > On Wed, Mar 21, 2018 at 02:47:22PM +0800, Xiao Ni wrote:
> > > There is a lock contention when there are many processes which send flush
> > > bios
> > > to md device. eg. Create many lvs on one raid device and mkfs.xfs on each
> > > lv.
> > > 
> > > Now it just can handle flush request sequentially. It needs to wait
> > > mddev->flush_bio
> > > to be NULL, otherwise get mddev->lock.
> > > 
> > > This patch remove mddev->flush_bio and handle flush bio asynchronously.
> > > I did a test with command dbench -s 128 -t 4800. This is the test result:
> > > 
> > > =================Without the patch============================
> > >  Operation                Count    AvgLat    MaxLat
> > >  --------------------------------------------------
> > >  Flush                     5239   142.590  3972.034
> > >  Close                    53114     0.176   498.236
> > >  LockX                      208     0.066     0.907
> > >  Rename                    2793     0.335     7.203
> > >  ReadX                    98100     0.020     2.280
> > >  WriteX                   67800   555.649  8238.498
> > >  Unlink                    7985     1.742   446.503
> > >  UnlockX                    208     0.058     1.013
> > >  FIND_FIRST               21035     0.141     3.147
> > >  SET_FILE_INFORMATION      6419     0.090     1.539
> > >  QUERY_FILE_INFORMATION   18244     0.007     0.130
> > >  QUERY_PATH_INFORMATION   55622     0.060     3.884
> > >  QUERY_FS_INFORMATION      9451     0.040     1.148
> > >  NTCreateX                63960     0.717   536.542
> > > 
> > > Throughput 12.1782 MB/sec (sync open)  128 clients  128 procs
> > > max_latency=8238.513 ms
> > > 
> > > =====================With the patch===========================
> > >  Operation                Count    AvgLat    MaxLat
> > >  --------------------------------------------------
> > >  Flush                    34858    36.484   668.243
> > >  Close                   379883     0.107   252.232
> > >  LockX                     1792     0.048     1.070
> > >  Rename                   21761     0.804   266.659
> > >  ReadX                   817947     0.021    42.891
> > >  WriteX                  254804   142.485   948.090
> > >  Unlink                   99665     3.590   899.816
> > >  UnlockX                   1792     0.056     1.240
> > >  FIND_FIRST              178857     0.187    23.287
> > >  SET_FILE_INFORMATION     41612     0.135    26.575
> > >  QUERY_FILE_INFORMATION   83691     0.007     2.589
> > >  QUERY_PATH_INFORMATION  470889     0.077    83.846
> > >  QUERY_FS_INFORMATION     82764     0.056    10.368
> > >  NTCreateX               512262     0.616   809.980
> > > 
> > > Throughput 53.6545 MB/sec (sync open)  128 clients  128 procs
> > > max_latency=948.105 ms
> > > 
> > > V3:
> > > Shaohua suggests mempool is overkill. In v3 it allocs memory during
> > > creating raid device
> > > and uses a simple bitmap to record which resource is free.
> > 
> > Sorry for the delay. The bitmap method is still too complicated. Can we do
> > something like this:
> > 
> > in mddev:
> > struct flush_info flush_infos[8 or 16]; Maybe don't need a special struct
> > too.
> > 
> > the info can include a lock, every time we handle a flush, we select a
> > flush_info by flush_infos[jhash(flush_request_bio_address)];
> > 
> > we then take the lock and do whatever current code does against the specific
> > flush_info. Isn't this a simpler implementation? I think 8 or 16 locks should
> > reduce most of the lock contention.
> 
> Hi Shaohua
> 
> There is a problem. If the method hash give the same result. The flush bios will
> wait for one flush_info, even there are some flush_info are free. If we use the
> bitmap method, the flush request can get one flush info in the flush_infos[]. 

If the hash algorithm, this shouldn't be a problem.
The bitmap method need a global lock, which we want to avoid I think.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html