----- Original Message ----- > From: "Shaohua Li" <shli@xxxxxxxxxx> > To: "Xiao Ni" <xni@xxxxxxxxxx> > Cc: linux-raid@xxxxxxxxxxxxxxx, "ming lei" <ming.lei@xxxxxxxxxx>, ncroxon@xxxxxxxxxx, neilb@xxxxxxxx > Sent: Wednesday, April 4, 2018 12:26:52 AM > Subject: Re: [PATCH V3 1/1] MD: fix lock contention for flush bios > > On Mon, Apr 02, 2018 at 10:48:36PM -0400, Xiao Ni wrote: > > > > > > ----- Original Message ----- > > > From: "Shaohua Li" <shli@xxxxxxxxxx> > > > To: "Xiao Ni" <xni@xxxxxxxxxx> > > > Cc: linux-raid@xxxxxxxxxxxxxxx, "ming lei" <ming.lei@xxxxxxxxxx>, > > > ncroxon@xxxxxxxxxx, neilb@xxxxxxxx > > > Sent: Monday, April 2, 2018 7:02:37 AM > > > Subject: Re: [PATCH V3 1/1] MD: fix lock contention for flush bios > > > > > > On Wed, Mar 21, 2018 at 02:47:22PM +0800, Xiao Ni wrote: > > > > There is a lock contention when there are many processes which send > > > > flush > > > > bios > > > > to md device. eg. Create many lvs on one raid device and mkfs.xfs on > > > > each > > > > lv. > > > > > > > > Now it just can handle flush request sequentially. It needs to wait > > > > mddev->flush_bio > > > > to be NULL, otherwise get mddev->lock. > > > > > > > > This patch remove mddev->flush_bio and handle flush bio asynchronously. > > > > I did a test with command dbench -s 128 -t 4800. This is the test > > > > result: > > > > > > > > =================Without the patch============================ > > > > Operation Count AvgLat MaxLat > > > > -------------------------------------------------- > > > > Flush 5239 142.590 3972.034 > > > > Close 53114 0.176 498.236 > > > > LockX 208 0.066 0.907 > > > > Rename 2793 0.335 7.203 > > > > ReadX 98100 0.020 2.280 > > > > WriteX 67800 555.649 8238.498 > > > > Unlink 7985 1.742 446.503 > > > > UnlockX 208 0.058 1.013 > > > > FIND_FIRST 21035 0.141 3.147 > > > > SET_FILE_INFORMATION 6419 0.090 1.539 > > > > QUERY_FILE_INFORMATION 18244 0.007 0.130 > > > > QUERY_PATH_INFORMATION 55622 0.060 3.884 > > > > QUERY_FS_INFORMATION 9451 0.040 1.148 > > > > NTCreateX 63960 0.717 536.542 > > > > > > > > Throughput 12.1782 MB/sec (sync open) 128 clients 128 procs > > > > max_latency=8238.513 ms > > > > > > > > =====================With the patch=========================== > > > > Operation Count AvgLat MaxLat > > > > -------------------------------------------------- > > > > Flush 34858 36.484 668.243 > > > > Close 379883 0.107 252.232 > > > > LockX 1792 0.048 1.070 > > > > Rename 21761 0.804 266.659 > > > > ReadX 817947 0.021 42.891 > > > > WriteX 254804 142.485 948.090 > > > > Unlink 99665 3.590 899.816 > > > > UnlockX 1792 0.056 1.240 > > > > FIND_FIRST 178857 0.187 23.287 > > > > SET_FILE_INFORMATION 41612 0.135 26.575 > > > > QUERY_FILE_INFORMATION 83691 0.007 2.589 > > > > QUERY_PATH_INFORMATION 470889 0.077 83.846 > > > > QUERY_FS_INFORMATION 82764 0.056 10.368 > > > > NTCreateX 512262 0.616 809.980 > > > > > > > > Throughput 53.6545 MB/sec (sync open) 128 clients 128 procs > > > > max_latency=948.105 ms > > > > > > > > V3: > > > > Shaohua suggests mempool is overkill. In v3 it allocs memory during > > > > creating raid device > > > > and uses a simple bitmap to record which resource is free. > > > > > > Sorry for the delay. The bitmap method is still too complicated. Can we > > > do > > > something like this: > > > > > > in mddev: > > > struct flush_info flush_infos[8 or 16]; Maybe don't need a special struct > > > too. > > > > > > the info can include a lock, every time we handle a flush, we select a > > > flush_info by flush_infos[jhash(flush_request_bio_address)]; > > > > > > we then take the lock and do whatever current code does against the > > > specific > > > flush_info. Isn't this a simpler implementation? I think 8 or 16 locks > > > should > > > reduce most of the lock contention. > > > > Hi Shaohua > > > > There is a problem. If the method hash give the same result. The flush bios > > will > > wait for one flush_info, even there are some flush_info are free. If we use > > the > > bitmap method, the flush request can get one flush info in the > > flush_infos[]. > > If the hash algorithm, this shouldn't be a problem. > The bitmap method need a global lock, which we want to avoid I think. OK, I'm trying to write with jhash method. There is one another problem. When updating raid disks, it needs to re-allocate flush info buffer and replace mddev->flush_info with the new buffer. I'm not sure the right way to protect it. If there are some flush bios submitting to md at the same time, it will have problems. Is it right to suspend md device when updating raid disks? (mddev->suspended = 1) Regards Xiao -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html